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I. INTRODUCTION 



One of the problems currently being studied by the 
Naval Research Laboratory (NRL) involves the processing of 
frequent and complex messages from satellites. The 
processing of these messages requires a high percentage of 
bit manipulations which uses a large amount of central 
processing unit (CPU) time. The currently available 
computers do not have sufficient capability to perform this 
processing in a timely manner. There are several options 
available to the NRL for improving the situation. One 
option is the use of a very fast computer, however, the 
cost of such a computer is very high. The purpose of this 
project is to evaluate another less costly option using an 
automatic microcode generating system (AMGS) . 

JRS Research Laboratories Inc. has developed an AMGS 
which generates microcode for the writeable control store 
(WCS) on the VAX 11/780. The JRS AMGS was developed to 
provide a low cost technique for algorithm implementation 
which provides the performance of microcode, yet does not 
require detailed machine level microprogramming. The JRS 
AMGS is a software package that generates microcode from a 
high 1 evel language (HLL) , thereby eliminating the need for 
the programmer to be concerned with the details of 
microcode. The user, therefore, need not understand 
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microcode programming and may apply the principles of 
software engineering through the use of an HLL. Figure 1-1 
ERef. 1: p. 5593 shows the steps involved in generating 
microcode from the HLL using the AMGS. It is important to 
note where the AMGS is machine independent and where it is 
machine dependent. This will be important in later 
discussions of the system. 

Since the target machine of the JRS AMGS, the VAX 
11/780, is a horizontally microprogrammed processor, it is 
capable of executing a number of operations simultaneously. 
This is the key ingredient to improving the speed and 
efficiency of the executable code because several 
microoperations may be executed concurrently. ERef. l: pp. 
558-5593 By applying the JRS AMGS to the data manipulation 
requirements of the satellite communication problem, a 
reduction in required CPU time should be achieved. 

Since the current method used by NRL for implementing 
the algorithms is to write them in Fortran, this project 
will compare the execution speed attained using the AMGS to 
the execution speed attained using Fortran code. The 
results of this comparison will provide an understanding of 
the type of algorithms that are suitable for implementation 
via the JRS AMGS, the performance improvements to these 
algorithms, and the costs of using this implementation 
technique. This study is based on two aspects of computer 
science: microprogramming and computer performance 
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Figure 1-1 : 



AMGS Operation 
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evaluation. Before the study can be described, these two 
areas must be defined. 

Wilkes defined microprogramming as a method of 
implementing the control function of a computer. CRef. 2: 
p. 593 The major advantages of microprogramming are: 

1) Low Cost: Mi croprogrammi ng allows large instruction 

sets to be implemented at a low cost because of the 
simple design process. Developing a hardwired design of 
an equivalent system would be very expensive. 

2) Flexibility: With mi croprogramming, it is possible to 

change the instruction set or to introduce a new set 
after implementation. This may allow a computer to be 
useful for many more years than originally planned. 

3) Simplicity: Microprogramming allows for simpler 

development due to the decrease in internal circuitry. 
This simpler design facilitates maintenance and reduces 
the problems associated with upgrading the design in the 
field. 

4) Speed: Although microprogramming is slower than some 

hardwired designs, a microprogrammed implementation will 
run faster on most algorithms than an equivalent machine 
language implementation. This is due to the machine 
language fetch and decode overhead. CRef. 3: p. 53 

A major disadvantage of mi croprogrammi ng is the memory 
delay penalty for fetching each microinstruction from the 
control store. This fetch penalty can result in slow 
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execution times if not dealt with properly, but the problem 
can be made less significant by providing an overlap 
between the fetch and the execute portions of 

microinstructions. CRef. 3: p. 53 

Since this project is concerned with comparing Fortran 
code with microcode, it is important to review the 
tradeoffs between using Fortran (or some other HLL) and 
microcode. Programming in microcode is very tedious and 
complex because the programmer must deal with the details 
of the machine. However, it is this complexity of 

microcode that can, through proper programming, lead to a 
speed advantage. On the other hand, Fortran and similar 
HLLs are not nearly as complex because the details of the 
machine are handled by the compiler. The slower execution 
speed for HLL’s results from both the generalization 
required in the code generation portion of the compiler and 
the instruction fetch decode penalty described in the 
explanation of the microcode speed advantage. 

Microprogramming, in its present state, may be used to 
provide efficient implementations of the control function 
on computers. While not providing the fastest execution 
speed possible, microcoding will provide a given level of 
throughput at a cheaper price than is otherwise possible. 
In addition, the speed of microcode has recently improved 
because of the development of fast, inexpensive 
semiconductor memories. These are the two main reasons to 
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suggest that the AMGS can give a performance advantage over 
Fortran source code. 

Since this study involves the evaluation o-f the 
performance of microcode, it is important to review the 
relevant performance evaluation techniques, methods, and 
problems. The performance evaluation in this study is a 
comparison between different implementations of the same 
algorithm. The classic application of performance 
evaluations is on operating systems to determine how to 
improve the system. To achieve the comparison, the 
evaluators must define a benchmark which represents the 
type of workload that occurs on that system. Defining this 
workload properly and accurately is very important or the 
results will be invalid. In the case of this study, a 
major consideration is the definition of the algorithms to 
be implemented and compared. 

One option when picking the algorithms is to choose a 
very specific application area and test only within that 
area. Another option is to attempt to test the entire 
realm of possible applications which would take many 
different algorithms. In Chapter Three, the application 
areas of interest will be defined and the subsets of these 
areas to be tested will be identified. The tests will be 
as comprehensive as possible and will cover as large an 
area as possible, however, exhaustive testing of the entire 
realm of applications is not possible. 



14 



The evaluation technique will consist of implementing 
the algorithm in both Fortran and in the AMGS HLL. Both 
the Fortran and the HLL codes will be executed, timed, and 
the execution times will be compared to determine the 
change in performance with the HLL microcode version. 
Since the algorithms are grouped according to application, 
it is possible to determine which applications have 

increased throughput from use of the AMGS. 

Several contributing factors must be considered during 
this performance evaluation. The effect of using an HLL 
instead of assembly language or direct microprogramming to 
implement the microcode version is significant because of 
the costs involved in using each method. Similar costs are 
associated with all HLLs, be it the JRS HLL and its 
associated compiler or Fortran and its resultant 
translation. Likewise looking at the more primitive 
languages of assembly code and microcode, the costs of 
programming in both languages are very similar. However, 
not so obvious is the tradeoff involved in choosing one 
type of language (high level versus low level) over 
another. The specific compilation techniques may also have 
an effect on the efficiency of the product and must be 
considered. The microcode compaction method used will 
certainly affect how fast the microcode executes. A 
performance evaluation must analyze many factors, both 
individually and combined, to produce valid results. 
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Chapter Two is a discussion of the issues o-f AMGS 
design, compiler technique, code generation, and code 
optimization. The purpose of this discussion is to assess 
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Because the AMGS is being evaluated for all applications, 
this chapter defines the basic areas that are tested in the 
project. After the basic areas are defined, the specific 
tests developed to cover these areas are explained and the 
information to be gained from each test is outlined. 

Chapter Four explains the mechanics of the testing 
including the timing mechanism and the effects of language 
features on the tests. An analysis of possible sources of 
errors is also included here to explain the validity of the 
resul ts. 

Chapter Five compares the data from the tests and 
analyzes the results. A step-by-step explanation of the 
testing is enumerated to insure a proper understanding of 
why certain tests were accomplished. The last chapter 
summarizes the results and makes deductions and 
recommendations for further research in this area. 
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II. BACKGROUND 



With the many advantages o-f microprogrammed computers 
there is no apparent reason why mi croprogramming should not 
be used for most applications. Low cost, fast execution 
time, and simplicity of design sound like exactly what a 
computer designer desires. There is, however, the problem 
of developing the microprograms (commonly called firmware) 
for the computer in a reasonable amount of time and with 
reasonable cost. Developing firmware has been a costly, 
error prone, and slow process because it has been done 
manually and because of the details that must be handled by 
the microprogrammer . 

The obvious answer is to eliminate the use of low level 
languages and place the mi croprogrammer into the world of 
high level languages. That is the intent of the AMGS. 
However, along with the advantages of an HLL come problems 
and considerations that cannot be ignored. This chapter 
will explore the many considerations of the AMGS and 
discuss their impact on the JRS AMGS. The following topics 
are considered to be the most relevant and will be 
discussed in depth in this chapter: 1) High Level Languages 

and Microprogramming, 2) Machine Independence, 3) 
Compaction and Optimization, 4) JRS AMGS Limitations, and 
5) Performance Evaluation Methodology. 
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A. HIGH LEVEL LANGUAGES AND MICROPROGRAMMING 



Higher level languages are designed to simplify 
programming by isolating the programmer from the details of 
the machine and placing him at a higher level of 

abstraction. An AMGS removes the programmer from the 
details of microprogramming and allows the programmer to 
write the code in an HLL. Writing a program in an HLL 
takes much less time than writing the same program in 
microcode because the programmer must deal with fewer 
detai.ls. There are many studies that have shown the 
advantages of using HLLs instead of assembl ycode. One such 
study claims that a programmer produces a set number of 

lines of code per day, independent of the type of code. 
Since one line of HLL code will produce many lines of 
microcode, it is logical to opt for the HLL if all other 
factors are equal. [Ref. 4: p. 1453 

However, all other factors are not equal. Since the 
HLL used for generating the microcode is a special purpose 
language, any program written to use this system must be 
translated from another language before it can be used. In 
this particular case, the time required to program in the 
HLL provided by JRS must be considered. Of course the time 
required to translate the algorithm to the HLL should be 
much less than would be required to translate the algorithm 

into assembly language code or into microcode. Since the 

JRS HLL is a language heavily influenced by the block 
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structure o-f Algol, Fortran, and Pascal, any algorithm 
written in a block structured language should be easy to 
translate to the JRS HLL. 

Software engineering literature provides many reasons 
for using HLLs. One such reason is that HLLs provide 
security not possible using microcode or assembl ycode. 
Forced typing of variables is one example of the security 
provided by HLLs. High level languages also provide 

features such as subprograms which are an advantage because 
they assist the user in subdividing the program into 

logical units. These logical units make the problem easier 
to understand and handle. 

The chief advantage of an HLL is the ease of program 
maintenance which results in lower life cycle costs for a 
program. Program changes can be very expensive if the 

programmer must read and understand low level code, with or 
without good documentation. Through use of an HLL, program 
changes can be made much more quickly and simply, with 

reduced costs. 

The advantages of HLLs are all ’nice’ for the 

programmer, but it is important to consider the 

side-effects of HLL usage. If the advantages of an HLL 

detract from the advantages of microprogramming <i.e.- 
simplicity, cost, flexibility, speed) then using the AMGS 
may not be justified. On the other hand, if the AMGS 
eliminates or minimizes other problems of microcode, then 
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the AMGS will become even more desirable. One such 
undesirable property of microcode is its machine 
dependence. In the next section we will look at the effect 
of using an HLL on the machine dependence of the resulting 
microcode. 

B. MACHINE INDEPENDENCE 

Machine independence is a major concern during the 
development of an AMGS because of the desire to make 
firmware portable. The AMGS is a tool used to help achieve 
the goals of machine independence and portability. Machine 
independence and portability are desirable character i sti cs 
for any computer language because if the code may be used 
on more than one computer, the overall firmware development 
costs will be lower. If every different target machine 
must have its own version of the algorithm written 
specifically for it, then the cost of program development 
will be a function of the number of target machines. A 
much more desirable method is to write one program that may 
be used on every machine resulting in only one program 
being developed. 

A machine dependent language is “a language in which 
all operations and data elements defined in the language 
have a direct mapping to a resource of the target machine." 
CRef. 5: p. 1941 The actual microcode is such a language 
because it specifically addresses the available registers 
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o-f a machine. To remain machine independent and avoid the 
problems of machine dependence, a language must avoid the 
details of the target machine and remain general enough to 
not become tied to any specific instruction set. This can 
be accomplished by defining an overall class of machines 
and then writing the language to fit into that definition. 
Such a definition includes such items as the minimum number 
of registers, the minimum stack size, and other hardware 
related items. Capitalizing on the similarities and 

avoiding the differences of the machines in the class 
simplifies this task. Any item that is not common to all 
machines in the class must not be included in the 

definition because it can not be supported by all machines 
in the class. 

The AMGS supports machine independence and portability 
of microcode by providing an intermediate language and an 
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machine independent, a user may write a program in the AMGS 
HLL and then use the code on different target machines. 
The major problem is making the transition from the machine 
independent intermediate language to the target machine’s 
microcode. To achieve this, each machine requires a 
separate code generator to translate the intermediate 
language to the microcode level plus a compactor to compact 
the resulting microcode. This is not a trivial step and 
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there is currently considerable research being conducted on 
microarchi tecture description techniques that will assist 
in making this step easier. Geiser has introduced a 
description methodology that covers -four basic areas: 

1) Microinstruction description: includes the format of 

the microinstruction, fields used in the 

microinstruction, and possible values in each field. 

2) Element descriptions: describes and names elements 

of the machine hardware including memory, registers, etc. 

3) Mi crooperati on usage rules: a set of rules for 

constructing valid microoperations. 

4) Microengine behavioral rules: specifies interactions 

between the mi crooperat i ons. CRef. 6: pp. 517 — 5211 

By using this technique it is possible to describe the 
target machine in a standardized format so that the writing 
of the machine dependent code generator is much easier. 

Of course the description methodology does not 
eliminate the problem. The main purpose of a description 
methodology is to reduce the work required to port the 
language to another machine by maximizing the common 
features of the different machine dependent languages. 
This may eliminate desirable machine dependent features but 
it does permit a ’nearly machine independent’ language. 
The assumption is if you cannot be totally independent then 
be as independent as possible. CRef. 5: p. 1951 
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True machine independence has not been achieved in this 
AhGS and probably will not be achieved in the near -future, 
however the microarchi tecture description methodology is an 
attempt at reducing the portability problem. By providing 
a systematic description of microarchi tectures, the 
description methodology reduces the amount of work required 
to move a system to another comparable machine. The AMGS 
is providing a step toward an ultimate goal of machine 
independence that may never be achieved. However, the AMGS 
has helped to define and simplify some of the steps 
involved in making microcode generation less machine 

dependent. 

C. COMPACTION AND OPTIMIZATION 

Before reviewing the current compaction techniques it 
is important to understand the difference between 
compaction and optimization. Microcode compaction will 

reduce the space required to store a program but does not 
guarantee a reduction in the speed of execution. 

Optimization, on the other hand, results in a reduction in 
execution speed but does not guarantee that any code 
compaction will occur. Sometimes execution time will 
decrease when the code is compacted, but the reduction of 
execution time is not guaranteed, in fact execution time 
can in some cases increase. The only conclusion that can 
be drawn is that successful compaction guarantees fewer 
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total instructions and may lead to -faster or possibly 
slower execution speed. 

Most HLL compilers do include an optimization step, 
however, the present version o-f the JRS HLL compiler does 
not. There are two reasons -for this. One reason is that 
excessive optimizations prior to microcode generation can 
make error correction very difficult because of the 
movement of the microoperations. Secondly, since this was 
the first production version of an AMGS, some of the more 
difficult problems were not handled. Optimizing the 
compiler without excessively affecting error correction is 
one of the more difficult problems. The AMGS does as a 
whole include a number of optimization steps designed to 
produce more efficient microprograms. An example of such a 
step is the use of registers to hold array offset addresses 
to help reduce memory fetch delay. Even though none of the 
common compiler optimization techniques are used in this 
system, it is important to discuss them here to understand 
the effect they could have on microcode compaction. 

Gries gives a good explanation of the four main 
compiler optimization techniques that are applicable to 
almost any algebraic programming language such as Fortran, 
Pascal, Algol, PL/ I, etc. The four methods are: 

1) Folding: for any operator whose operands are known at 
compile time, perform the applicable operation at compile 
time rather than at execution time. 
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2) Eliminating redundant operations: mainly -factoring out 
common subexpressions. 

3) Moving operations out o-f loops i-f their operands do 
not change within the loop. 

4) Reducing the number o-f multiplications in loops: 
e-f-fecti vely changing the multiplications to additions. 
[Ref. 7: pp. 376 - 3773 

A system may use these techniques to attain whatever level 
o-f optimization is desired, however there is a tradeoff 
between the level o-f optimization and the time required to 
per-form the compilation. Also as mentioned above, 
extensive optimization will result in radically altering 
the sequence o-f operations and therefore make debugging 
very di-f-ficult. CRe-f. 7: p. 3763 

Even though optimization is important, there has been 
very little work done on optimization o-f microcode. Almost 
all o-f the work done on microcode has been in the field of 
compaction because optimization of microcode is very 
difficult to do systematically and is not well understood. 
Most microcode compaction research has been justified by 
the assumption that execution time will decrease when the 
code is compacted. It is important to keep this assumption 
in mind when discussing compaction because the results of 
the compaction are not guaranteed to reduce execution time 
and will certainly not optimally reduce execution time. 
However, compaction is the only automated method for 
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improving microcode that is currently available -for 
practical use. 

It is important to remember the assumption that the 
target machine will be horizontally microprogrammable, 
meaning that more than one operation may be executed during 
any microinstruction. If the target machine is not 
horizontally microprogammable, then only one microoperation 
may occur during any microinstruction (or machine cycle) 
and compaction is not possible. There are two classes of 
microcode compaction -For horizontally microprogrammable 
computers, local and global, and a discussion o-f the 
compaction techniques -from both classes will -Follow. JRS 
does not do any code compaction in this version o-f the 
AMGS. However, by reviewing the many methods o-f compaction 
available it will be evident which methods are the most 
promising -for -future improvements. 

Local compaction o-f microcode is concerned with the 
reduction o-f the number o-f microinstructions in a 
straight-line code (SLC) segment o-f a microprogram. An SLC 
segment is any sequence o-f microinstructions that begins 
either at the start o-f the program or after a branch 
statement and ends either at the end of the program or at a 
branch statement. Only one entrance and one exit is 
allowed in any SLC segment. Local compaction is simply an 
attempt at reducing the number of microinstructions in each 
SLC segment by combining instructions or eliminating 
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duplicated instructions. The most promising and popular 
versions are -first-come first-serve, critical path, branch 
and bound, and list scheduling. 

First-come first-serve is probably the simplest form of 
local compaction possible. Each mi crooperat i on is 
considered only once, in source code order, and in the SLC 
segment that it exists. Each microoperation is moved as 
far forward in its segment as possible. If it can be 
combined with a previous operation without causing a 
conflict, then it will be combined. Once a mi crooperati on 
has been checked and combined or not combined, it will 
never be considered again. This results in fast compaction 
but the resulting microcode is not optimally compacted. 
[Ref. 8: p. 4151 

Critical path algorithms compact microcode in each SLC 
segment by identifying microoperations "that cannot be 
delayed without increasing the number of microinstructions 
needed for the microprogram." [Ref. 8: p. 4151 This is 
accomplished by first identifying the longest paths in the 
data dependency graph. Each of the longest paths is called 
a critical path and shortening the path will result in a 
more compact program. Each mi crooperat i on in each critical 
path is checked to see it if can be moved forward and 
combined with another microoperation. If it can be moved 
forward, the critical path will be shortened and the result 
is a more compact program. If any microoperati on in any of 
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the critical paths is delayed (not forwarded as much as 
possible), then the trailing microoperations will be 
delayed, which will result in more microinstructions than 
are actually needed and less compact microcode. [Ref. 8: p. 
4221 Once again the results are not optimal and the time 
required to do the compaction is a polynomial function of 
the number of microoperations which are considered in each 
SLC segment. 

Branch and bound algorithms can guarantee optimality in 
storage space required for the mi croprogram. Remember that 
this says nothing about the execution time of the program. 
The method depends upon searching a tree structure 

exhaustively, looking for the optimal ordering. This 
method may produce optimal compaction, but the time 

required is an exponential function of the number of 

microoperations in the microprogram, making the method very 
expensive. There are variations to the branch and bound 
algorithms that are not so expensive. One such variation 
involves pruning the tree structure prior to searching the 
tree. This pruning reduces the cost of the algorithm to a 
polynomial function of the number of input mi crooperat i ons. 
However, the reduction in cost also produces less than 
optimal microcode. [Ref. 8: p. 4241 

List scheduling searches through each SLC segment and 
attempts to schedule each mi crooperat i on at the earliest 

possible point within the window of code that is being 
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considered. 



The size of the window is variable but the 



larger the window the longer the time required to do the 
job. Also, as the window size is increased, there is a 
diminished return (diminished amount o-f code compaction) 
•for each unit increase in window size because of the 
increased chance of finding a data dependency. The further 
away the compaction is attempted, the greater the chance of 
two data items needing the same register, or some other 
data dependency. List scheduling is not optimal, but the 
cost is as low as first-come first-serve and the results 
are better than first-come first— serve. 

Of these four local methods, only list scheduling and 
first -come first -serve can be done in what is considered a 
'reasonable 7 amount of time and produce acceptable results. 
The fact that list scheduling produces better results than 
first-come first-serve in general was shown in a study done 
by Davidson, et al . CRef. 8: p. 4601 This would justify 
the use of list scheduling as the compaction method for the 
AMGS if only local compaction methods were available, 
however there are global compaction techniques that should 
be considered. It is an intuitive notion that global 
compaction techniques should provide better compaction 
since they look at the entire program and not only at small 
SLC segments. 

It is true that, in general, global compaction 
techniques provide better compaction than local compaction 
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techniques yet, in comparison to local compaction 

techniques, global compaction techniques are very 

expensive. Trace scheduling, tree compaction, and 
compaction based on a generalized data dependency graph 
(GDDG) are the three most promising global compaction 
techniques. Trace scheduling identifies the most 

frequently traversed path through a section of microcode 
and does a local compaction on that path. The process is 
repeated on all of the paths through the microprogram until 
no further microoperation movement is possible. A data 
dependency graph must be constructed for each path analyzed 
and any microoperations that are moved must be documented. 
This documentation is done to insure that the moving of 
mi crooperati ons will have no effect on other loops. The 
bookkeeping for trace scheduling is the most expensive 
part. In fact in the worst case, the memory required to 
run a trace scheduling compactor can grow exponentially. 
CRef . 9: p. 4803 Therefore, although trace scheduling does 

an excellent job of microcode compaction, the overhead is 
too high to justify its use. 

Tree compaction is based on trace scheduling. The 
advantage of tree compaction over trace scheduling is the 
control of the increase in memory size. Tree compaction 
divides the microprogram into subsets and applies the trace 
scheduling techniques to the subsets individually. This 
achieves compaction that is close to the results achieved 
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by trace scheduling yet is not nearly as expensive. This 
method may be useful when it is -Fully researched and 
understood, however tree compaction still produces 
microcode that is less than optimum and the cost can be 
high. 

The third global compaction method is based on a global 
data dependency graph (6DD6) . A GDDG "is capable o-F 
representing in a single chart the data dependency of 
microorders not only within a basic block but in different 
basic blocks. 11 CRef. 10: p. 9241 Both trace scheduling and 
tree compaction use a data dependency graph (DOG) to 
represent the data dependency of microorders in the basic 
blocks, however a DDG is not capable of representing the 
data dependencies beyond the basic block. This is the most 
important aspect of global compaction; moving microorders 
to adjacent blocks when possible. 

Through use of the GDDG, it is possible to identify 
microoperations that ’must 7 be in a basic block and those 
that ’may’ be in a basic block. Then, by identifying the 
frequency of execution of the separate blocks it is 
possible to make intelligent choices about moving 
microoperations from block to block or within the same 
block. The algorithm costs an amount which "is practically 
Q(n), where n is the number of microorders contained in a 
source microprogram. " CRef. 10: p. 9301 This is a very low 
cost and the preliminary results show that the algorithm 
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provides compaction that is within three to five percent of 
optimum (handwritten) microcode. 

Of the three global compaction methods described, only 
the method based on an GDD6 is efficient and results in low 
costs. Why then did JRS not use this compaction method in 
the AMGS? The answer is that during development of the 

AMGS, this compaction method was not available. JRS is 
currently revising the system to incorporate the GDDG 
global compaction technique, which should result in a much 
more efficient system than was evaluated in this study. 

By looking at the two main compaction methods, global 
and local, it is evident that global compaction holds the 
most promise for efficiency that approaches the optimum. 
Once global compaction methods are more thoroughly 
researched and developed, they will become the logical 
choice if the cost can be controlled. Global methods are 
the only methods that approximate the handcoded versions. 
Local compaction does provide some compaction but does not 
in general do as well as handwritten microcode. 

D. AMGS LIMITATIONS 

The AMGS developed by JRS is designed to allow a small 
CPU intensive algorithm to be compiled in microcode and 
placed in the WCS of the VAX 11/780. When the algorithm is 
needed it can be called from a Fortran program. There are 
several limitations of the system that are important to 
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remember when considering what applications may be used on 
this system. Individually the limitations may seem small 
and even unimportant, however, the combined effect of the 
limitations may eliminate some of the applications. 

First, the WCS only has IK words of memory for the 
microcode. Since the microcode must be loaded into the WCS 
before execution due to linkage requirements, paging of the 
algorithm into the WCS during execution is not considered 
an option. Therefore the user is limited to an algorithm 
or collection of algorithms that is no larger than 769 
mi croi nstructi ons because the other 255 instructions are 
used for predefined functions. In fact, of the 769 
microwords of memory available, about 30 instructions are 
already taken up by function entry and exit code that is 
required for register initialization and can not be 
modified by the user. The exact number of instructions 
varies depending upon the microprogram being executed. 

Compacting a long algorithm to fit into the limited 
space of the WCS may be difficult or even impossible. Once 
the user has determined that the algorithm will fit in the 
WCS, then he/she must determine the 'hot' spots of the 
program (portions of the algorithm that use the most CPU 
time), separate those parts of the program from the rest, 
code those parts in the JRS HLL, and set up the microcode 
procedure call. This may be only a minor inconvenience 
but, it is extra effort needed to use the AMGS. 
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Second, JRS claims that the AMGS code will do integer 
arithmetic and comparisons very quickly, but any problem 
involving primarily -floating point arithmetic will achieve 
minimal, i-f any increase in performance. This is because 
the JRS HLL uses the same floating point acceleration 
routines as the Fortran program. Portions of the floating 
point algorithm that do not use the floating point 
accelerator may execute faster when executed on the AMGS, 
but the net gain will probably not be very large due to the 
overhead of the floating point accelerator. During the 
testing of the AMGS the truth of this claim by JRS will be 
documented since there will be several tests to check the 
floating point accelerator performance. 

The JRS HLL is set up to support only integer and 
floating point data structures. No character data 
structure is available and therefore applications using 
characters are not considered feasible. Arrays of integers 
and floating point numbers are possible but the lack of a 
character data structure will limit some applications or at 
least make them very difficult to do. 

If the algorithm includes I/O then the algorithm must 
be rewritten to eliminate the I/O from the portion of the 
algorithm to be coded in JRS HLL since the HLL does not 
include any I/O statements. The I/O can normally be moved 
into the Fortran program that will call the WCS program. 
Besides providing an I/O function, the Fortran program will 
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set up any data structures needed -for the program. This is 
really no more than a minor inconvenience, but it does 
complicate the use of the system. 

Several other restrictions are listed in the AMGS 
manual and repeated below. 

1) Combined maximum of fourteen arrays and compiler 
temporary variables. 

2) Maximum of twenty DO-1 oops nested at any one time. 

3) Maximum of five hundred symbols may be defined in a 
program. 

These restrictions will not, in general, eliminate 
applications but they are restrictions based on the 
implementation of the system on the VAX 11/780. These 
restrictions are important because they point out some of 
the machine dependencies that exist even when an attempt is 
made to remain machine independent. 

The final limitation of the JRS AMGS is a simple 
observation. One of the main motivations for having an 
AMGS is to allow for portability of the microcode. 
Presently, this system is only implemented on the VAX 
11/780. Therefore, a current, yet hopefully temporary 
limitation is that the AMGS has not been programmed to 
generate microcode for any other machines. This limitation 
will result in eliminating many of the advantages of the 
AMGS if it is not corrected. 
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Assuming the application algorithm can be coded around 
these limitations, the user should be able to achieve 
better throughput by using the AMGS. A goal o-f this 
project is to make it easier -for a user to determine if a 
potential application will benefit from the use of the 
AMGS. 

E. PERFORMANCE EVALUATION METHODOLOGY 

The performance evaluation was conducted to determine 
the throughput possible using the AMGS. There are many 
techniques available for doing performance evaluations 
including hand timing, formula methods, instruction mixes, 
and benchmarks, each having individual advantages and 
di sadvantages. The method used for this evaluation must be 
capable of comparing two different programs and of giving 
accurate results. Therefore a collection, or benchmark of 
programs was defined with each program representing a 
different possible application for the AMGS. 

This kernel of programs was carefully developed to 
contain the characteristics of the many possible algorithms 
which might be run on the system. This is a very important 
step for the validation of the results. If the proper 
program character i st i cs are not tested, the results will be 
invalid. By categorizing the algorithms according to 
application it is possible to specify what applications 
will benefit by use of the AMGS. 
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After de-fining a kernel o-f programs and coding them in 
both Fortran and the JRS HLL, the programs were run and the 
results compared. Besides comparing execution time, other 
•factors previously discussed in this chapter were 
considered. Ease o-f programming, system reliability, and 
the compatibility o-f the application problem with the AliGS 
were also considered. 

One important question is how much better a manual 
microprogrammer could do. The purpose o-f using the AMGS is 
to achieve increased throughput without using a large 
amount o-f programming time as would be required with the 
manual method. Even though manual mi croprogrammi ng is 
costly due to development time, it is considered the 
standard and the results o-f the performance evaluation 
should be compared against the standard. By comparing all 
three execution times, Fortran, JRS microcode, and hand 
written (actually hand compacted) microcode, it will be 
possible to identi-fy the best applications and possibly 
determine methods -for making the slower applications 
faster. 



F. BACKGROUND SYNOPSIS 



Si nee 


the 


main 


f actors 


affecting the 


AMGS 


have been 


rev i ewed , 


the 


next 


step is 


to determine 


the 


kernel 


of 


programs 


to 


be 


tested. 


These programs 


must 


be 



representative of the applications that might be used on 
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the AhGS. The purpose of defining this kernel is to attain 
general results that Mill give an AMGS user an idea as to 

the effectiveness of a specific application. The next 

/ 

chapter Mill discuss the applications to be tested and the 
programs used to test those applications. 
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III. PROGRAM GENERATION 



There are many limitations that must be considered when 
choosing the proper benchmark -for a system. The benchmark 
must take into consideration the AMGS limitations 
enumerated in the previous section and insure that the 
results are not biased by those limitations. Limitations 
such as the WCS size and the existence of only integer and 
real data structures have a major effect on the 



appl ications 


possible 


when 


using the AMGS. 


With these 


limitations 


in mind. 


it 


is possible to 


define some 


appl ications 


that can 


use 


the AMGS. One common computer 



application that will definitely not have increased 

throughput due to AMGS use is I/O intensive applications. 
The HLL was designed without I/O capability because 
microcode implementations do not increase the throughput 
for I/O intensive applications. However there are several 
applications for which the AMGS should theoretically 

provide increased throughput. 

The applications tested in this study are grouped into 
four basic areas. These areas are: 

1) Integer mathematics 

2) Floating point mathematics 

3) Sorting and Searching (Comparisons) 

4) Bit manipulations 
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There are several subcategories in the -four basic areas. A 
discussion o-f the subcategories -follows. 

Mathematically intensive applications that do 
calculations within the limits o-f the AMGS are prime 
candidates -for the system. There are several di-f-ferent 
types o-f mathematical calculations that should be 

considered. Integer arithmetic must be considered 

separately from -floating point arithmetic due to the 
di-f-ferent methods used for doing the calculations. Integer 
addition/subtraction is handled internally by the AMGS, but 
the floating point accelerator (FPA) on the VAX computer is 
used for floating point calculations and integer 

multiplications. This call by the AMGS to the FPA results 
in a significant amount of overhead for each call. When a 
Fortran program calls the FPA there is also some overhead, 
but since Fortran translates to machine code and machine 
code calls to the FPA involve less overhead than AMGS 
calls, the net result is slower execution time for the AMGS 
code than for Fortran code during floating point 

operations. This extra overhead in the AMGS is due to a 
requirement to save the state of the microprogram prior to 
executing the floating point operation. The result is a 
net loss of throughput when doing floating point 

calculations or integer multiplication on the AMGS. 

Several types of calculations are possible when doing 
integer and floating point calculations. Division, 
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multiplication, addition, and subtraction are different 

arithmetic operations and the increase in throughput may be 
different for each type of calculation. As much as 
possible, this project will categorize the different 

calculations and show the percentage of increase possible 
for each category, however, in the interest of reducing the 
total number of tests we will combine tests that are very 
similar. Since addition and subtraction take the same 

amount of time in mi croprogrammed processors, they will be 
tested together. Multiplication and division are not 
implemented similarly and will not be tested together. In 
fact, since division can usually be implemented as 
reciprocal multiplication, division will not be tested. 
Integer exponentiation is normally accomplished by a series 
of multiplications and therefore will be considered a part 
of the multiplication test. 

Another major application of the AMGS is sorting and 
searching. Since sorting and searching both include 
comparisons of bit patterns, they may be considered 
together in one broad category. The major difference is 
that sorting usually includes moving of data or moving the 
pointers to the data, while searching simply involves 
comparing until the desired data is found. 

One final category that is directly applicable to the 
NRL problem is bit manipulation. This category includes 
the comparing, shifting, and replacement of bits or fields 
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o-f bits within a word. This category may be an excellent 
application o-f the AM6S due to the bit manipulating 
commands that are built .into the JRS HLL such as the shift, 
swap, and mask functions. Fortran has the ability to do 
the bit manipulations, but the functions are provided 
through library calls which tend to be slower than direct 
language implementation constructs. 

The next section of this chapter is an explanation of 
each test and the basic area it is designed to test. The 
explanation of the results of each test is included in 
Chapter Five. Table 3-1 lists the four basic areas and the 
tests that cover each area. 



Table 3—1: Specific Tests 



Integer Math 

1 . Do Loop 

2. While Loop 

3. Summation 

4. Factorial 

Sorting /Searching 

1. Bubble Sort 

2. Sieve of Eratosthenes 

3. Quicker Sort 

4. Binary Search 



Floating Point Math 

1. Chebyshev Cosine 

2. Fast Fourier Transform 

Bit Manipulations 

1. Bit Manipulation 

2. Bit Reversal 



The simplest test was designed using the loop 
structures. The WHILE loop and the DO loop provide a 
method for testing addition or multiplication and comparing 
the results directly with the Fortran equivalent. The 
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simplest test is a WHILE loop that only increments the 
loop counter. This test can be done as many times as the 
user desires and it also can be nested to any desired level 
to test the effect of nesting. The basic area being 
checked in this test is the addition and comparison 
required each time a loop is completed. This comparison is 
required to determine the test condition for exiting the 
loop. A DO loop is another version of the loop construct, 
with the increment being automatic and the condition test a 
part of the DO statement. By using these two tests it is 
possible to document how much time is required to execute 
the overhead steps in any loop. This overhead cost will be 
used to analyze programs with loops. 

The next two tests use the basic loop structure to 
determine the summation of an integer or the factorial of 
an integer. Each of these tests can then be used with the 
results from the previous test to determine the amount of 
time required to do either an integer multiplication or an 
integer addition by simply subtracting out the loop 
overhead. 

Floating point multiplication is the subject of the 
next test. By implementing a Chebyshev approx imati on for 
the cosine of an angle and calculating many values, it is 
possible to determine the amount of time spent doing 
floating point multiplication for each system. There are 
some floating point additions that will add overhead to the 
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test, but 


the effect of the additions should be minimal. 


This test 


particularly reveals the overhead of calling the 


floating 


point accelerator from the AMGS. JRS 



documentation states that since both Fortran and the AMGS 
use the same FPA, there should be no speed gained by use o-f 



the AMGS. 


If the overhead of calling the FPA from the 


microcode 


is too high, then it will make the AMGS slower 


than the 


Fortran. This is an important experiment since 



the results will be a prime factor in determining if the 
AMGS should be used for floating point applications. 

There were three tests written to evaluate the ability 
of the system to do comparisons. The first is a sort 



algorithm 


called Quickersort written by R. S. Scowen. This 


al gori thm 


works by continually splitting the array of 


values to 


be sorted into parts and sorting the parts using 


the same 


method. The second algorithm is a method to 


determine 


all of the prime numbers between two values. 



This problem, called the Sieve of Eratosthenes, uses 



additions. 


comparisons, and assignment statements to 


determine 


the prime numbers in a specified range. This 


al gori thm 


will give an insight into how all three of these 



items interact to affect the throughput of the AMGS. The 
third test is a bubble sort that sorts an array of integers 
into ascending order. By using a loop construct, 
comparisons, and a simple assignment statement, this 
algorithm is an excellent example of a well structured 
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module that does comparisons and uses assignment 
statements. 

Another test algorithm is a Fast Fourier Transform 
(FFT) written in two parts because the entire program would 
not fit into the WCS. One part is a bit reversal program 
that simply assigns elements of an array to different 
locations in the array. The other part is the complex 
multiplication plus a Chebyshev cosine and sine generation 
routine for use in the FFT. The bit reversal is an 
excellent comparison of assignment statements between the 
two methods and therefore goes in the bit manipulation 
category. The FFT complex multiplication is another 
floating point multiplication and addition algorithm. By 
using the results of these two algorithms, we gain an 
example of a long algorithm that uses the entire WCS (the 
FFT) plus an algorithm that is only concerned with moving 
values around in memory (the bit reversal ) . 

The final test is an algorithm to do bit manipulations 
using the bit manipulating functions provided by both the 
JRS HLL and the. Fortran library. The algorithm takes an 
array of integers and performs different operations on the 
integers such as AND, OR, EXCLUSIVE OR, etc. These 
operations were chosen directly from the example NRL source 
code, so this test specifically tests the NRL application. 

With the test programs now fully defined, the next step 
is to describe the test runs and the timing mechanism used 
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to perform the tests. The interdependence of tests will be 
discussed in the next chapter as well as the effect of 
using different language features on the individual tests. 
After discussing these effects the test results can be 
presented and analyzed. 
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IV. TESTING 



The testing of the programs was done with the most 
accurate tool available so that any error in the timing 
mechanism would be minimized. That is why the timing 
mechanism and its accuracy were so important to the results 
o-f this study. Once the accuracy o-f the timing mechanism 
was determined, the minimum length o-f the test was 
speci-fied to make the test length much longer than the 
possible error. Besides the testing mechanism, there are 
other aspects of program design that affect the execution 
time of the resulting object code. Since this is primarily 
a comparison between Fortran and the AliGS, both the factors 
affecting the execution time of compiled Fortran code and 
the factors affecting the AMGS were identified and 
considered during the programming phase of the project. 
The desire was to make the tests as equitable as possible 
in the two different languages. 

A. TIMING MECHANISM 

The VMS system library provides a software mechanism 
for timing blocks of code. There are no hardware monitors 
available to time individual programs and hand timing is 
very inaccurate in a virtual memory system. The only 
method that is relatively accurate, easy to use, and can 
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account -for the virtual memory mechanism is the system 
library timing -function. There are two ways to use this 
library -function and both methods display precision to the 
nearest one— hundredth o-f a second. The system manual 
claims that actually calling the system library timing 
function is more accurate than using the SECNDS Fortran 
language feature (which uses the system library function). 
CRef. 11, p. C-301 Even though the claim of better 
accuracy is not substantiated by any specific numbers in 
the manual, the system library function was chosen for 
these tests. 

There is a certain amount of overhead as a result of 
each library call and since this overhead cannot be 
accurately measured, it results in inaccuracy which must be 
minimized. To time a segment of code requires two calls to 
the library routine with the code to be timed sandwiched 
between the two calls. The first call starts the timing 
and the second call records the time. To minimize the 
impact of the overhead in each use of a library function, 
the minimum time for the code segment execution must be 
much longer than the overhead. For this study, we 
determined by actually testing a series of timing calls 
that the upper limit of the overhead for each library call 
was less than .005 seconds. Therefore, we designed the 
Fortran version (without common or subroutine) of each test 
to last a minimum of two seconds. This means that the 
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overhead for the two library calls in that version is less 
than one-half of one percent of the test length. All tests 
lasted longer than one second except for one test (the 
binary search microcode compacted version) and therefore 
the possible error due to the timer is less than one 
percent except in the one test that is shorter than one 
second. 

Because some of the algorithms being tested can be 
accomplished very quickly (in less than 0.5 seconds) it is 
important to increase the execution time. This was done by 
repeating the algorithm several times to insure that enough 
time was spent in the algorithm to produce accurate 
results. To accomplish this, the input data can not be 
changed during the program iteration and all iterative 
counters must be reinitialized on each iteration. These 
extra instructions do add overhead to the test but the 
overhead is the same in each version of the test and 
therefore the impact was considered to be minimal. 

When the timing mechanism is invoked it produces any of 
five different values that are useful in analyzing the 
amount of time spent in an algorithm. The first value 
available is the elapsed time spent in the system, whether 
executing or waiting. The second value is the total CPU 
time that the algorithm being timed was executing. This is 
the most important value since it displays the actual CPU 
time the program required to execute. Next is the number 
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of buffered I/O requests and the number o-F direct I/O 
requests. These numbers are not important in this study 
since no I/O is being done during the timing periods. The 
last available value is the total number o-f page -faults 
occurring during the timed period. This number is valuable 
because it states how many times the job was interrupted 

and waited -for a new page o-f memory to be -fetched. The 

larger this value, the greater the chance for error because 
the clock must be stopped and started for each page fault. 
The fewer page faults and the closer the elapsed time is to 

the CPU time, then the less chance of inaccuracies due to 

timing errors. 

B. LANGUAGE FEATURES AND THE EFFECT ON TIMING 

Before looking at the effects of the language features 
it is important to note that if a programmer does 'dumb’ 
things, almost any algorithm can be programmed 
inefficiently in any language. It is a basic assumption 
during these tests that the algorithms are not being 
programmed poorly and every effort is made to use good, 
solid algorithms. Also, since the same algorithm is being 
programmed in both languages, any bad programming practices 
will be present in both versions and therefore tend to 
cancel each other out. 

The next consideration is the effect of language 

features on execution speed. In the JRS HLL, there are no 



50 



features that will affect execution time except for the 
call to the FPA when doing floating point arithmetic. 
Floating point arithmetic requires a separate algorithm 
because of the data representation required. The data must 
be represented in one word and that one word includes both 
a mantissa and an exponent. The algorithm must separate 
the mantissa and the exponent, perform the operation after 
aligning the decimal point, and then store the mantissa and 
exponent back into the single word of memory. Floating 
point arithmetic is common in all block structured 

arithmetic languages and therefore Fortran has the same 
problem, but not to the same extent as the JRS HLL. 

The Fortran language is not as simple as the JRS HLL 
and therefore some of the Fortran language features affect 
the execution speed of the program. Fortran has several 
different data access and parameter passing modes that do 
affect the execution time of a program. It is important to 
design tests that show the effects of different uses of 
these features on the execution time of the same algorithm. 
Otherwise, the results of the tests will only be valid for 
the language features being used in that specific test and 
could not be generalized for any program in the testing 
category. 

Some of the language features of Fortran that affect 
the execution time are common blocks, subroutines with 
common blocks, and subroutines with parameters. Common 
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blocks a-f-fect the execution time of a program because the 
blocks are placed in a specific location in memory which 
results in more indexing and slower data access for each 

item. If instead of using a common block the data is 
simply declared in memory, there will be a shorter access 
time for each data item and faster execution. 

The use of subroutines adds overhead due to the linkage 
conventions and activation record initialization that is 
required. Each time a subroutine is called, the current 
state (registers and program counter) must be saved in an 
activation record to insure that the state can be 

reinitialized when the subroutine is exited. When common 
blocks are combined with the use of subroutines there is 

both the overhead of the subroutine call and the overhead 
of accessing the data items in the common data area. These 
two added types of overhead result in increased execution 
time when compared with code that does not use the 

features. On the other hand, the features provide methods 
of passing data that are not otherwise available. 
Therefore the user must tradeoff modularity in design and 
ease of passing data between subroutines for longer 
execution times. 

The use of subroutines with parameter passing results 
in even more overhead because of the requirement to set up 
the data area for the parameters, passing the parameter 
upon subroutine call, and returning the new values of the 



52 



this 



parameters upon subroutine termination. Again, 
language -feature adds to the convenience and modularity o-f 
the program, yet results in longer execution time. 

It should be noted though that without common blocks or 
parameter passing there is no way to pass data between 
subroutines. Also, if a data area is large, parameter 
passing may be very costly, even to the point of being 
unusable. Another possibility is writing the program 
without using subroutines or data passing mechanisms. 
However this usually results in programs that are difficult 
and expensive to read and maintain. Since this is 
unacceptable for most software projects, most programs are 
written using some, if not all of these features. 

In order to document this tradeoff, all programs were 
tested in each of the following categories. 

1) Fortran without use of a common block or subroutine. 

2) Fortran using a common block but no subroutine. 

3) Fortran using a common block and a subroutine. 

4) JRS HLL using a common block and a subroutine. 

By testing each program using each of these methods, we can 
identify the amount of time added by the use of each 
language feature. The user can then weigh the use of the 
JRS HLL depending upon what features are desired. The most 
realistic comparison is between a Fortran program using 
subroutines with common blocks and the JRS HLL because the 
JRS HLL requires the use of both a subroutine call and a 
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common block. Besides, most large Fortran programs are 
written using subroutines and common blocks so that the 
resulting program is modularized yet allows for easy data 
access. 

One other requirement was determined during the testing 
due to the VMS operating system being a virtual memory 
system. When preliminary tests were made it was determined 
that other users seemed to have an effect on the execution 
time of the tests. Therefore, the tests were made under 
two different conditions. One condition was with the 
system clear of any other users. The other condition was 
with other users on the system. This was done to be able 
to document the difference, if any difference existed, and 
clear up the question about the effect of other users on 
the timing of a program. Chapter Five has a further 
explanation of the timing mechanism accuracy analysis. 

The main emphasis during the writing of the tests was 
on making the programs equivalent. All versions of each 
algorithm must do each step the same way so that the 



comparison is fair 


yet 


the tests 


must be 


programmed 


as 


a 


’normal' programmer 


would 


do it 


in that 


1 anguage. 


If 


a 


program is written 


in a 


special 


way that 


is known 


to 


be 



optimal for one of the languages then the comparison of 
execution times would favor that language. However, if it 
is natural to use the feature in that language then that 
was the way it was done. One example of this policy is in 
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testing the cosine -function. Since the VAX 11/780 VMS 
operating system provides a cosine library -function, the 
library cosine -function was compared to the Chebyshev 
approximation to see which method is -faster. Thus both 
methods (Chebyshev and the system library -function) will 
achieve approximately the same answer, however the 
algorithm used to achieve the answer will be di-f-ferent. 
This special case is done to measure the e-f-fect o-f not 
having a trigonometric function procedure available in the 
JRS HLL library. Included in this test is the resulting 
inaccuracy of the Chebyshev approximation, the space used 
to store the routine in the WCS, and the execution speed. 

With the specification of the testing methods, testing 
categories, and timing mechanisms, the next step is to 
compare the results. A comparison of execution times of 
each program in each category of testing was accomplished. 
During the explanation of the comparison, an analysis of 



the data 


and 


a summary 


of the results is given. This 


anal ysi s 


will 


al 1 ow 


us 


to specify which applications 


the 


AMGS improves 


and 


whi ch 


applications the AMGS does 


not 



i mprove. 
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V. PERFORMANCE COMPARISON 



The results of the tests can now be analyzed and 
compared since the -factors affecting mi croprogrammi ng and 
the processes involved in testing have been reviewed. To 
insure that the analysis is complete, the raw data is 
presented first followed by an analysis of the results. 
The analysis will first compare the effects of language 
features on each of the tests and then compare the results 
of the different types of tests (ie. while loop, do loop, 
etc.). The final section of this chapter discusses the 
validity of the tests and analyzes the error in the tests. 

A. RAW DATA ANALYSIS 

The raw data is given in Table 5-1. All tests were 
programmed in the four categories explained in Chapter Four 
but only five of the algorithms were hand compacted. The 
times shown in Table 5-1 are the mean values of ten tests 
of each algorithm without other users using the VAX 11/780. 
The number in parenthesis in the table, if shown, is the 
value that should be added or subtracted from the mean 
value to define the 99% certainty range for the mean value. 
If no number is shown in parenthesis, then the value is one 
hundredth of a second. An explanation of how the range was 
determined is given in the last section of this chapter. 
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Table 5—1: Test Results (in seconds) 



NO COMMON COMMON COMMON HLL HAND 

STRAIGHT STRAIGHT SUBROUTINE JRS OPTIMIZED 



PROGRAM FORTRAN FORTRAN FORTRAN HLL MICROCODE 



While Loop 


11.11 


18. 18 


20.20 


11.12 






Do Loop 


7.07 


10. 10 


12. 12 


10. 12 






Factorial 


4.61 


7. 46 (.02) 


7.54 


8.88 




8.63 


Summation 


4.49 


9. 77 (.02) 


9. 77 ( . 01 ) 


5.70 




2.87 


Cosine 


5.09 


6. 17 


6.24 


8.62 






Cosine (Lib) 


8.72 


■ 

4* 
1 4 

« 

4* 


— 


— 






FFT 


11. 16 (.02) 


13. 08 ( . 02) 


13. 74 ( . 03) 17. 37 ( 


.02) 




Sieve 


3.39 


4. 18 


4. 10 ( . 02) 


2.49 






Binary Search 


2.50 


3.54 


3.43 


1. 17 




0.79 


Bubble Sort 


3. 59 (.03) 


4. 58 (.03) 


4. 79 (.03) 


3. 77 ( 


.02) 


2.29 


Quicker Sort 


8. 78 (.04) 


9.75 


9.80 (.02) 


4.75 




4.36 


Bit Reversal 


4. 75 (.02) 


7.49 


7.50 


2.25 






Bit Man ip 


8. 40 (.04) 


8. 53 (.07) 


8.41 (.02) 


2.98 






Not al 1 


programs 


were hand 


compacted because 


the 


compaction 


required 


special 


knowl edge 


of 


VAX 



microprogramming and also required a significant amount of 
time. The tests to be compacted were chosen to insure that 
a representati ve sample was taken from each of the 
categories. Another criterion for choosing the tests for 
compaction was to choose some tests which were faster in 
Fortran and some tests that were faster in microcode to 
compare the effect of compaction. The basic purpose was to 
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see, in general, how much better we could do with the 
compaction without exerting a tremendous amount o-f e-f-fort. 
That purpose was attained by compacting the -five selected 
programs. 



When 


looking at the times from Table 5-1 in general. 


some of 


the results were counter — intuitive because the 


expected 


result is to have the microcoded version execute 


faster. 


In many cases the Fortran versions were faster or 


as fast 


as the microcoded versions. This can be attributed 



to three -facts mentioned in earlier chapters. First, the 
microcode that is generated -from the HLL by this AMGS is 
not compacted. Second, the Fortran compiler generates 
highly optimized code. The third reason is that some of 
the routines used as tests involve floating point 
arithmetic or integer multiplication, both of which use the 
floating point accelerator. The use of the floating point 
accelerator results in increased overhead for microcode. 
These three factors, separately or combined, resulted in 
some cases where the Fortran outperformed the microcode. 

B. EFFECTS OF LANGUAGE FEATURES ON THE TESTS 



The 


different Fortran language features were tested to 


i sol ate 


the effects of the different techniques for data 


passing. 


The important point is that the tests were 



programmed as a ’normal ’ programmer would program them. No 
special attempts were made to make specific tests run well 
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in either of the two languages. Since it was unlikely that 
a determination could be made as to what the "normal" 
programmer would do, the three different Fortran tests were 
devised so that the user could determine which method was 
needed for his/her application. Of course, if a programmer 
chose the Fortran without subroutines or common data areas, 
then he/she was giving up the use of some very important 
software engineering techniques. 

In general, the tests of the different Fortran language 
features resulted in more speed with fewer features and 
less speed with more features. The fastest Fortran 
technique in all cases was the version that used no 
subroutines and no common data structure. The use of 
common data areas with and without subroutines resulted in 
somewhat unexpected data. The expected results were for 
the versions using common data without subroutines to run 
faster than the version using common data areas with 
subroutines. This occurred in most but not all of the 
tests. In general, there was only a slight increase in 
execution time when a subroutine with common was compared 
with the same program with no subroutines but with common 
data, which implies that the overhead of calling and 
returning from a subroutine (without any parameters) is not 
very significant. In fact, in most cases there was no 
statistical difference (discussed in the last section of 
this chapter) between the tests with subroutines and common 
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and the tests without subroutines but with common. One 



possible explanation is that the variation in the length of 
time to start and stop the timing mechanism is greater than 
the length of time required to call and return from a 
subroutine. Since there is only a single call in each 
test, the results may not show any difference when the 
subroutine is used. 

The hand compaction of the JRS HLL microcode always 
resulted in faster execution than the uncompacted JRS HLL 
microcode. This is as expected since the hand compacted 
code was derived from the JRS HLL microcode. In no case 
were instructions expanded ("n" microinstructions encoded 
into "n+k" microinstructions, where k > 0) and therefore no 
increase in execution speed for the hand compacted code was 
expected. It should be realized that the method used for 
generating the hand compacted microcode does not really 
produce hand compacted microcode because the compaction was 
done to an existing program. The microprogrammer did not 
set up the problem according to his own liking. The 
microprogrammer simply took the generated microcode and 
compacted it using his knowledge of the VAX 
mi croprogrammi ng . If mi crooperat i ons could be combined 
with other microoperations to reduce the total number of 
microinstructions, they were combined. The important point 
to remember is that the microcode was machine generated and 
hand compacted. 
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Another point that must be mentioned about the data 
analysis in general is the overhead involved in the JRS HLL 
microcode. The length of time required to make the call to 
the microcode plus the overhead involved in the use o-f 
common data is not documented anywhere and can not be 
determined in this study because the timing mechanism is 
not accurate enough. Therefore during the analysis of the 
data, it is important to remember that when the JRS HLL 
microcode is called there is a certain amount of overhead 
in the call. This overhead is most likely more than the 
overhead of a subroutine call in Fortran because the state 
of the micromachine must be initialized. The other point 
is that all data in the microcode is in a common data area 
and therefore, as has been documented, requires extra time 
to access. Probably the best comparison between Fortran 
and JRS HLL is to use Fortran with common data and 
subroutines because the overhead of the common data and the 
subroutine calls approximately cancel out each other. 
Therefore, it is possible to compare the actual speed of 
each method rather than comparing the overhead involved in 
each method. 

The overhead involved in the subroutine call and the 
common data area will not always be constant. If there are 
only a few data items being accessed in the subroutine then 
all of the data values can be placed in registers which 
reduces the access time. However, if an array or a large 
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number o-f variables are being accessed then it will take 
longer to get the data in and out because o-f the use o-f a 
common data area. The important point when looking at the 
comparisons being made in the next -few sections is that if 
common data structures and subroutines are used in Fortran 
(which is almost always done) then the execution speed will 
not be as fast as the fastest Fortran test. If the 
decision is made to not use the common data structures and 
subroutines then the programmer will be giving up 

modularity of design and other software engineering 

techniques for faster execution. 

C. COMPARISON OF TEST RESULTS 

This section will compare the results of the Fortran 

versions with the HLL versions. The comparison will be 
done within the four basic areas defined in Chapter Three. 
Each test algorithm is available in an appendix in both the 
Fortran implementation and the JRS HLL implementation. The 
Fortran version of the algorithms available in the 

appendices is the version in a subroutine with a common 

data structure. The algorithms have been placed in the 
appendices according to the basic area that they are 

testing. Table 5-2 lists which appendices contain which 
individual tests. The algorithms have been removed from the 

individual test harnesses, however an example harness 

(Factorial Program) is available in Appendix E. 
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Table 5—2: Table of Tests in Appendices 



Appendix A: Integer Mathematics 

1 . Do Loop 

2 . While Loop 

3. Summation 

4. Factorial 

Appendix B: Floating Point Mathematics 

1. Fast Fourier Transform 

2. Chebyshev Cosine 

Appendix C: Sorting/Searching 

1. Binary Search 

2. Quicker Sort 

3. Sieve of Eratosthenes 

4. Bubble Sort 

Appendix D: Bit Manipulations 

1. Bit Manipulation 

2. Bit Reversal 

1 . Integer Mathematics 

The basic loops were included in the integer 
mathematics category because all that occurs in the loop 



construct 


is 


an increment 


and test until the condition is 


met , 


at 


which 


time a jump out of the 


loop is executed. 


This 


i s 


very 


simple and 


uncomp 1 i cat ed 


so the expectation 


was 


that 


the 


microcoded 


version would 


not be much better 



than the Fortran version. In fact, the JRS HLL WHILE loop 
was only as fast as the fastest Fortran version while the 
fastest Fortran DO loop was much better than the JRS HLL DO 
loop. The results imply that the Fortran code is highly 
optimized. 
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Since each of the loop tests involves only one 
variable, the common data area access time penalty can not 
be blamed since the variable was stored in a register. 
There is the overhead o-f calling the subroutine and setting 
up the data registers however that alone should not cause 
the microcode to be as slow or slower than Fortran. The 
only logical answer is that the optimization and compaction 
o-f the di-f-ferent codes has a large e-f-fect on the execution 
speed. One other important point about the loops is that 
in all cases the DO loop is faster than the WHILE loop. 
This is most likely due to better optimization because the 
looping variable is part of the loop construct while in the 
WHILE construct the incrementing of the variable occurs 
independently from the language construct. 

The next test was the summation of an integer 
value. This test measured how fast addition could be done, 
however, since each summation could be done very quickly, a 
loop construct was set up to repeat the summation 10,000 
times. The results were that the fastest Fortran version 
was slightly faster than the JRS HLL version, even after 
subtracting the overhead of the WHILE loop. This result 
was not expected but can be explained as the result of lack 
of code compaction because when the summation microcode was 
compacted by hand, the execution speed became significantly 
faster than the fastest Fortran version. 
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The -final integer mathematics test is the factorial 
program. The test was limited to a maximum input of 12 
because 13! is beyond the limits of the storage capacity of 
a four byte integer. Therefore, to make the test last long 
enough for timing purposes, a loop was set up to calculate 
the factorial 100,000 times prior to stopping. 

The result of the factorial test validates the JRS 
claim that integer multiplication is slow because of the 
use of the FPA. After subtracting the overhead of the 
loop, the JRS HLL is still twice as slow as the fastest 
Fortran version. In fact, the slowest Fortran version, 
using common data areas and a subroutine, is faster than 
the JRS HLL microcode. Therefore, the AMGS should not be 
used for integer multiplication intensive algorithms 
because of the FPA overhead. 

The JRS HLL did not result in any performance 
improvements for any of the integer arithmetic tests 
accomplished in this study. This was due either to a lack 
of microcode compaction or to the use of the FPA for 
integer multiplication. 

2. Floating Point Mathematics 

There were two algorithms for testing the floating 
point mathematics applications, the Chebyshev Cosine 
routine and the Fast Fourier Transform (FFT) . Both 
algorithms substantiated the JRS claim that floating point 
calculations would not do well in microcode. The execution 
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speed of the FFT HLL version was about twice as long as the 
fastest Fortran version. The other Fortran versions were 
of course slower than the fast version due to the use of 
common and a subroutine call. 

The Chebyshev Cosine routine gave the same type of 
results as were attained for the FFT, a slow down of about 
80%, caused by the FPA. However, the interesting part of 
this test is in comparing the speeds of the Chebyshev 
Cosine with the speed of the Cosine Library function. The 
overhead of the Library Function call is very high because 
even the JRS HLL (which is the slowest Chebyshev version) 
is faster than the Library Function test. Therefore it is 
justifiable to say that while the use of the HLL for doing 
trigonometric computations is not a great improvement, this 
test does demonstrate that the commonly used features of a 
language can be costly and that the microcode does give 
slightly better performance than the Library Function. 

Both tests in this basic area support the JRS claim 
that floating point arithmetic will not be helped when 
coded in JRS microcode. Since that point has been well 
documented, we will now look at the sorting and searching 
tests to see what kind of results they produce. 

3. Sorting and Searching 

There were four tests accomplished in this area and 
three of the four gave results that were favorable for the 
JRS HLL. The one test where the JRS HLL ended up being 
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slightly slower was the bubble sort algorithm. There was 
no looping mechanism to subtract away -from the problem and 
the algorithm consists o-f only assignment statements and 
comparisons. There-fore there is no reason to explain the 
slow performance except -for the lack of compaction of the 
microcode. 

The Sieve of Eratosthenes program test resulted in 
the JRS HLL version running about 25 V. quicker than the 
fastest Fortran version. This result was expected since 
the microcode is able to do comparisons rather quickly. 
One other interesting point became apparent during this 
test. Since the tests are supposed to be written as a 
’normal' programmer would write them, it is sometimes 
easier to use a DO loop rather than a WHILE loop or vice 
versa. However, when trying to get code to execute 
quickly, it is obvious that the Fortran DO loop is much 
faster than the Fortran WHILE loop as shown in Table 5—1. 
On the other hand, the JRS HLL DO loop is not nearly as 
fast as the Fortran DO loop and only slightly faster than 
the JRS HLL WHILE loop. Therefore, a program is dependent 
upon the language construct chosen by the individual 
programmer and if a DO loop is used in the Fortran version 
while a WHILE loop is used in the JRS HLL version, there 
will be a greater difference in results. 

To avoid this discrepancy in results (after it was 
noticed in the initial results), the Sieve algorithm was 
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rewritten in both languages using DO loops because a 
de-finite iteration (the DO loop -function) was what was 
needed in the algorithm. The change in speed o-f the 
algorithms due to the use o-f the DO loop was not 
tremendous. However, this test does demonstrate the e-f-fect 
of using different language constructs plus the use of 
’normal’ programming techniques and constructs. 

The Quicker Sort algorithm demonstrated the speed 
of the microcode as was expected. Since only comparisons, 
additions, and subtractions with one multiply are used, 
this algorithm is very fast. The Binary Search algorithm 
results ended up with the JRS HLL being twice as fast as 
the fastest Fortran version. Again, this was expected 
because of the use of comparisons during most of the 
algorithm. This algorithm produced the second best 

performance increase by the JRS HLL microcode of all of the 
tests. This was probably due to the fact that the 
algorithm has only one DO loop, one WHILE loop, and the 
rest of the algorithm is made up of if-then constructs 
which are simple comparisons. 

The sorting and searching tests were a good 

application of the JRS HLL microcode. For the most part, 
the microcode resulted in faster execution speed than the 
corresponding Fortran program, however, the increase was 
never much more than twice as fast. 
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4. Bit Manipulations 



The last basic area of the tests is the bit 
manipulation area. Two tests were accomplished in this 
area and both gave positive results -for the microcode 
version. The bit reversal program ended up with a large 
increase in execution speed. The program was simply used 
to switch items in an array. No comparing was needed since 
the program switched the items in the array according to a 
convolution scheme. This test demonstrates the speed of 
the assignment statement in the microcode. 

The bit manipulation program also resulted in 
faster execution for the JRS HLL than for the fastest 
Fortran version. The main reason for this fast execution 
is that the Fortran version uses system library routines 
which are slow to call and execute. Therefore, it is 
actually the slowness of the Fortran library routine rather 
than the speed of the microcode that gives the increased 
throughput. The important point is that the microcode does 



improve 


upon 


the 


execution 


speed of the 


corresponding 


Fortran 


code 


and 


theref ore 


the AMGS gives 


a performance 


increase 


for 


these kinds 


of operations. 


It is also 


important 


to 


note 


that this 


program was simply a series of 



calls to the microcode or Fortran routines that perform the 
functions. No other operations besides the driving DO loop 
were needed in the algorithm and therefore it was a very 
accurate test of the actual speed of the tested code. 
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D. TEST ERROR ANALYSIS 



Because this testing was done on a virtual memory 
system, there was a possibility of error due to the timing 
mechanism being switched on and off many times. The 
intention of the tests were to give the user an accurate 
estimate of how much speed would be gained by using the 
AMGS. To insure that the estimate is as accurate as 
possible, a computation was made to determine the 
confidence interval for the mean. Also, to determine if 
the virtual memory system was affecting the results, a test 
was performed that allows us to state, with a specified 
amount of confidence, whether the virtual memory system 
affects the results. 

Since we made several runs of each test, we were able 
to determine a mean execution time for each test and a 
standard deviation for each test. However it is important 
to do a statistical analysis to determine how confident we 
are of these results. The question of confidence was 
answered by using the Student T distribution (because of 
the small sample size) to find the interval within which 
the mean will fall with the specified amount of confidence. 
For these tests, a confidence of 99 V. was desired. The 
following formula was used to determine the range of the 
mean execution time for 99% confidence. The value for ’ t” 
is dependent on the level of confidence desired aqd was 
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read from a Student T distribution chart. CRef. 12: p. 4883 
* X' is the mean of the sample, 'S' is the sample standard 
deviation, and ’n* is the total number in the population. 

X - t(S / n) < u < X + t(S / n) 

To find the effect of the virtual memory system 
required performing each test under two different 
conditions. First, each test was made with other users on 
the system. This could be anywhere from one other user to 
twenty users. Next, each test was performed with all other 
users locked out of the system and the entire computer 
running only the system support programs and the tests for 
this project. Then a hypothesis, called the null 
hypothesis (HI) was assumed. The null hypothesis was that 
both samples came from the same population. To test the 
null hypothesis we used the following formula, where XI and 
X2 are means, SI and S2 are standard deviations, and nl and 
n2 are sample sizes (in this test, 10). 

t = (XI - X2) / SORT ( (Sl/nl + S2/n2) ) 

If the calculated ’t’ (from above) >= ’t’ (from the 
chart based on 99 7. certainty), then the null hypothesis can 
be rejected. In other words, the samples do not come from 
the same population which means that the number of users on 
the system does affect the results. If the value ’t’’ < ’ t ’ 
(from the chart) then they could be from the same 
population and the other users on the system may not affect 
the results. CRef. 12: pp. 214 - 2213 
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The results of the -first confidence test mentioned 



above were enumerated in Table 5—1 with data taken on the 
VAX 11/780 with all other users locked out of the system. 
In general, the results were very accurate in that they 
gave a small range in which the anticipated results would 
fall. The null hypothesis test gave mixed results. It was 
hoped that we would be able to state that the tests with 
other users on the system would be from a different sample 
set than the tests without other users. However, that was 

not the case in general. In most situations, the tests 

with other users on the system simply showed a higher mean 
but the possible range for 99 7 . certainty included most or 

all of the range for the test without other users. 

Therefore, in the second test the null hypothesis could not 
be refuted in most cases. However, it does appear that 
other users on the system do affect the timing mechanism 
but only because they increase the standard deviation of 
the tests and thereby widen the range of values for 99% 
certai nty . 
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VI. CONCLUSIONS 



The purpose of this project was to evaluate the 
performance of the JRS AMGS. This has been accomplished by 
comparing the performance of the JRS HLL microcode with 
Fortran code on the VAX 11/780. The testing has produced 
some unexpected results and has shed light on several 
interesting points. The first point being that microcode 
will not always result in faster execution of an algorithm. 
During the testing it became apparent that this was due 
mainly to two causes. One reason is that for the speed of 
microcode to be fully utilized, the microcode must be 
properly compacted. The other is that the use of the FPA 
by the microcode results in slightly degraded performance. 

The second point is the effect of the different 
language features upon the execution speed of the Fortran 
code. When the fastest Fortran code was compared with the 
microcode there were several cases where the Fortran was 
much faster than the microcode. However, when the slowest 
Fortran code was compared with the HLL microcode the 
microcode was faster. This was true in all cases except 
when the FPA was required. Testing the effects of the 
language features revealed an important point since the use 
of the features allows a programmer to use software 
engineering techniques. When these features are not used 
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it is very difficult -for a programmer to use software 
engineering techniques such as modularity and in-formation 
hiding. Without these techniques the code may run -fast but 
it is usually very di-f-ficult to develop and always hard to 
maintain. Therefore, a tradeoff must be made between the 
convenience and security of the language features or the 
speed advantage possible without the features. 

A. CONCLUSIONS FROM THE DATA ANALYSIS 

The analysis of the data allows for some conclusions to 
be drawn about the use of the AMGS for specific 
applications. The conclusions are grouped in terms of the 
four general areas defined in Chapter Four rather than 
about individual tests so that a user may make a decision 
based upon a general category of application rather than a 
specific example program. Specific program results will be 
mentioned if the results of that test vary significantly 
from the other tests in the specific area being discussed. 

The integer mathematics application resulted in no 
advantage from the use of the AMGS. This is most likely 
due to the lack of compaction of the microcode. This 
conclusion is justified because when the summation 
program's microcode was compacted and subsequently 
executed, the results were a significant increase in 
execution speed. Therefore, it is assumed that if the code 
was properly compacted the execution speed would be 
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improved. The only test in the integer mathematics 
category that would not be greatly improved by the 
compaction is the -factorial test. This is due to the use 
o-f the FPA -for integer multiplication. 

The -floating point mathematics area also turned out not 
to be a good application -for the AMGS. This was expected 
and the probability that this would happen is documented in 
the JRS HLL manual. The difference in the magnitude of the 
execution speeds is interesting because the JRS HLL runs 
about 6051 slower than the fastest Fortran version. 

The sorting and searching application area demonstrated 
promising results for the AMGS. In three of the four tests 
the AMGS version was significantly faster than the fastest 
Fortran version. In one test (the bubble sort) , the 
Fortran was faster than the AMGS but this is probably due 
to a lack of compaction rather than due to a lack of 
applicability to the AMGS. From the results of these four 
tests, it is justifiable to say that sorting and searching 
are both good application areas for using the AMGS. 
However, it should be noted that at this point in the AMGS 
development, the difference in execution speeds is not as 
good as it could be with compacted microcode. 

The bit manipulation area also resulted in favorable 
results for the AMGS. In fact, this was the best 

applications area of the JRS HLL because both tests ended 
up more than doubling the speed of the Fortran code. Of 
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course, one of the tests was slow in the Fortran version 
because of the use of library functions, however, since 
that was the only way to easily perform that function in 
Fortran, that was the way it was programmed. 

Now that we have defined the areas where improvement is 
possible the question remains about whether the AMGS should 
be used by NRL? The answer to this must be based on more 
factors than simply execution speed. We must also consider 
system cost, ease of use, and actual improvement possible. 

Since the improvement is at the best two to three times 
better than the Fortran code, the cost in money and 
programming effort can not be justified by the possible 
gain. When the system is improved to include microcode 
compaction with a resultant increase in performance, then 
the AMGS cost may be justified. Somewhere in the area of 
an order of magnitude increase in speed is necessary before 
the cost of the system (money and programming effort) is 
justified. 

The AMGS did prove capable of producing microcode that 
is as fast or slightly faster than the compiled Fortran 
code. Therefore, if an application exists that will use a 
microcoded machine, the AMGS is capable of producing a 
large amount of 'acceptable' microcode. The AMGS can 
produce the microcode very quickly in comparison to 
conventional methods. Also, the AMGS can produce large 
amounts of microcode at much less expense than is possible 
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with hand microcoding. The AMGS therefore provides a 
mechanism -for producing ’acceptable’ microcode efficiently 



and inexpensively. 

One other possible use of the AMGS is to produce 
microcode that can be hand compacted. If the 

microprogrammers are available, the HLL can be used to 
produce an uncompacted microprogram and then the 
microprogrammers can be used to compact the HLL microcode. 
This technique produced very good results during the study 
and the cost in microprogrammer’s time is much less than 
writing a complete microprogram from scratch. 



B. FUTURE RESEARCH POSSIBILITIES 

There are several areas that can be researched as a 
continuation of this work. Some areas relate directly with 
this type of microcode generating system but other areas 
are points that became obvious during the study yet had to 
be ignored to keep the scope of the thesis within reason. 
One area of research is to evaluate the next version of the 
JRS AMGS. The next version is now available and has 
microcode compaction which should result in much better run 
time results. Also, the revision has more language 
constructs that more closely parallel the constructs 
available in the more modern block structured languages. 
With these revisions, it should make the system easier to 
use and give better results. 
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Since one o-f the suggested advantages of the AliGS is 
portability of the JRS HLL microcode, it is important -for 
this system to be implemented on another machine so that 
the work involved in doing such a job can be documented. 
The possibility o-f implementing the AMGS on another machine 
is already a stated goal, but until it is done, a proper 
testing o-f both implementations can not be made. The 
comparison o-f the results o-f the tests would document the 
portability o-f the system and demonstrate the ease with 
which the machine transition could be made. It would also 
be advantageous to have another language such as Fortran or 
Pascal used as the source code instead o-f the JRS HLL. 
This would make the AMGS accessible to more people 
resulting in a better chance of the system becoming more 
widely used. 

The cost of using different language features in 
Fortran was interesting even though it was a sidelight of 
the study. Further study could be done as to the exact 
cost of using a subroutine with or without parameters. 
Also, the actual cost of using a common data area could be 
documented so that a user knows how much the use of such a 
feature is costing. Of course this kind of testing would 
be system dependent, but if that system used these language 
constructs for a significant amount of work, the results 
could be very helpful in making decisions during future 
programming efforts. 
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The final suggestion for further research has to do 
with defining the application areas. It would be very 
helpful if there were some guidelines as to what 
applications use what operations. These guidelines would 
be very helpful during future system performance evaluation 
efforts. 
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APPENDIX A 



I N T E 3 E R MATHEMATICS ALGORITHMS 



THE DO L TOP IN A F3PTPAN SUBROUTINE 
•I' IS THE LOOP VARIABLE WHILE 'K* 

IS THE riTAL N J M 3 E R OF I T MFS THE LOOP 
I L L BE EXECUTED. 

S'JOROUf F >IE OOLOOP 

COMMON) / ft C S / t,< 

DO T=t,* 



E N DOG 



F \ID ! OF 0 01. JOP 



P ? 



\ I H 1 3 PROGRAM IS A OO L 00° vRTTTE'J T N JR3 HLL \ 



PROGRAM O f iLOO D ; 
INTEGER I,K; 

00 I = 1 
£ ' 1 ) 00 ; 

stop; 

too. \ OF OGLOOP 



ro <; 



\ 



C THIS IS THE 'J h I L £ LOOP [M FORTRAN 
C COUNT HOLDS THE TOTAL NUMBER OF TI^ES 
C THE LOOP NILL BE EXECUTED. ZERO HOLDS 
0 THE VALUE 0. 



SU0ROJT INF /JILELOUP 
INTEGER COUNT, ZERO 



Cf)M / q\| / v C 3 / CO JUT, ZERO 



00 \’H T L E (COUNT . GT. ZERO) 
COUNT - CHUII - 1 



E ID 0 0 



EnD i OP OIL EL OOP 



\ THIS IS THE 'JH T L E LOOP IM JR S HLL \ 



PROGRAM /JHILEL03P; 

INTEGER COUNT, ZE y O; 

00 /JHILE ( COOfl T . G T . ZERO) 
GO JO I = COUNT - l; 

E 'toon ; 

S T o 3 ; 

E ID. 
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THIS TS THE SUM ALGORITHM IN FORTRAN 

•COUNT ' 13 THE NJM3ER OF TTMES THE SUMMATION aILL 

SF COM°U TEO . 'VALUE' IS THE NUMBER TO BE SUMMED. 

' T F M P ' IS A STORAGE LOCATION FOR 'VALUE'. 'TOTAL' 
TS THF VALJE UF THE SUMMATION. 'ZERO' HOLOS THE 
VALUF 0. 

SUBROUTINE SUM 

T NT EGER TOTAL, VALUE, TEMP, COUNl, ZE r H) 

COMMON / ai C S / TOTAL, VALJE, lE^P, CO'iNT, ZERO 

Z f R 0 = 0 

DO /'HTLE (COUNT .01. Zt'PO) 

RtlNfTTALTZE T HE VARIABLES FOR THE SJM ROUTINE 

COUNT = CO JMT - 1 

VALUE = TEMP 
TOTAL = ZERO 

THIS TS The ACTUAL SUMIJCG OF THE VALUE 

on ahile (value .gt. zero) 

TOTAL = TOTAL f V A L ■ 1 E 
VALUE = VALUE - 1 
E N 0 00 

E ID 0 1 

E iO ! OP SUV 



S6 



\ SUMMATION! ALGORITHM TIM JPS HLL \ 

PRQGRA-1 SU^mn.i; 

INTEGER TOTAL, VALUE, TE^P, COUMT, ZERO 
DO 4 H [ L F (COUMT .Gt. ZERO); 

COUNT = CO JIM T - l; 

value: = TE4P; 

TOTAL = 0; 

DO w J l L E ( VALUE .GT. ZERO) ; 

TOTAL = T 3 1 A L + VALUE? 

V ALOE = VALUE - I : 

ENDDO; 

E n D i) 0 ; 

s r n => ; 

EM). 
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C THE FACTORIAL SUBROUTINE I 'I FORT AN 
C ’COUNT’ DETERMINES HOW MANY TIMES THE FACTORIAL 
C OF ’VALUE* MILL BE DETERMINED. ’TOTAL’ HOLDS 
C THE ANSWER A'10 IS INITIALIZED TO 1. ’TEmr* 

C HOLDS THE FACTORIAL VALUE TO BE DETERMINED 
C FOR REUSE . 



SUBROUTINE FAC 

INTEGER TOTAL, </ALUE, TEMP, COUNT, ZERO, ONE 
COMMON / W C 3 / TOTAL, VALJE, TEMP, COUNT, ZERO, ONE 

DO r,rt T L E (COUNT .GT. ZERO) 

COLIN I = COJNT - 1 
VALUE = TEMP 
TOTAL = ONE 

AO WHILE ( V A _UE .GT. ZERO) 

IOTAL = TOTAL * VALUE 
V A L IE = VAL IF - 1 

END DO 

End D) 

END I OF PAC 
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\ THE FACTORIAL PROGRAM in JRS HLL \ 



PROGRAM FACTORIAL; 

ITT EGER TOTAL, VALUE, TEMP, C 0 UMT, 

DO /V H I L E (COiJ'IT . G T . ZERO); 

COJNl = COJNT - 1; 

VALUE = TEMP; 

t o r a l = i ; 

AtJ H I L £ (VALUE .GT. ZERO); 
TOTAL = TOTAL * VALUE; 

VALJE = VALUE - 1; 

E it D DO; 

Eodoo; 

5TI.! 3 ; 
t ID. 
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ZERO, I 



APPENDIX B 



FLOATING POINT M A THEM A r I C S ALGORITHMS 



S'lBROUI T'ME fft 



A**********************************-************** 

FAST FOURIER TRANSFORM 

************************************************* 

X - COMPLEX ARRAY x(R**M) 

v - ORDER OF FF 1 , \| = 2*+ -1 



BASED URDU A \| FFT FI^ST UEVELDPF.D BY SIGNALS 
SCIENCE CORPORATION For PROJECT SALESCLERK . 

" I R 3 T TRAMSCRHtO BY LCD U C LAUR^ICr, US', 
MODIFIED BY LT •" HARTO'iG, US >• 

REAL X R E A L ( 0 0 ">6 ) , x j m a r; ( ■) 0 Q6 ) , T R E A L , I I v, /* n , T R ?F A L , 
l T PI MAO, IRE AL , lJ I V AG 

DA | A PI/5.t<USR?6S/ 

\j = ? * * m 



M STAGE FOURIER TRANSFORM 



D j 00 L = t , M 

u- ,»=? * * ( *+ 1 -l ) 

l. E 1 = L F 0 / R 
U B h A L = l . 0 
0 I vt A r, = 0 . 0 

PHASt - R l / F L 0 A T f L E | ) 

/v = C mrlx ( C03 (RB ASF ), -S l T ( phase ) ) 



IF (PHASE . G T . 
n BASER = 

EL SF. 

3 BASER = 

E f'l R l F 



( ° 1 / R . 0 ) ) IHF\i 
R I - 3 B A S F 

phase 



1 



CALCBl 



C u S X = 0.RQQ3S/9S - ( 0 . US ‘)R '1 0 US * PHASER * PiAsEp) 
COS* - C 0 3 < t (0.0S9 -)Rr7<4 * 0 BASER * PHASE? * 
Phaser * phase? ) 

IF ( PHASE .ST. (Pt/P.n) ) THE .- 1 COSY = -CUSx 
Ale 3 T ! 



B() 



IF (PHASE .LT. (PI/2.0) ) THEM 

PHASE2 = 3 I /2 . 0 - PHASE 
ELSE 

PHASE2 = PI - (5.0 * PI/2.0 -PHASE) 

EMDIF 



SI'IX = 0.P999S79S - ( 0 . 4 9 ° 2 0 0 'J 5 *PHASE2 * PHASE?) 
SIMX =■ SIimX + ( 0.05^62679 * PHASE? * D H A S E 2 * 

1 PHASE? * PHASE?) 

Z DF.CTMATIOM IN TIME 

00 20 J=1 , LEI 
l»(i 10 [ = J,U,LE0 
l»’ = J H_E 1 

T iv E A L = xPEAl(T) + X 7 E A L ( I ° ) 

T l m A Q = X I m A S ( I ) + X I m A G ( 19) 

1 2 p E A L = XPEAL(l) - fPEAI.(IP) 

T 2 I m A S = X ( M A G ( I ) - X I m A G ( I P ) 

x K t A l. ( I p ) = 1 2 p E \ L * U P E A L - I ? r • • A G * 1 1 1 •’ a r; 

x I H A G ( IP) = T ?k E A L * IJfHA'G + I?I-aG * i.ipEAl. 

x R E A L ( I ) = T 9 E A L 
X I 1 A G ( I ) = TIMA G 

10 COM I I M IE 



THE At. = 


.IHEAL x cos * 


- (,I I mag * 


-31 i x ) 


.11 MAG = 


niHEAL * -SI 


MX) t J I - 1 A G 


* C 0 S X 


1 I M,|f 








IR ■■! 









F HO 
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PROGRAM F F T ; 



\ FAST FOURIER TRANSFORM - AMGS HLL VERSION \ 

\ ***»AA*****»***A*v***»A***«'*****-**A****»-**A***A\ 

I 1 1 EGER I , J , M, |\|,L , LEO , LE t , IP / <0 J'JT , K , NvP,NM,NM I , 

REAL URF.Al, If MAG, PHASE , COS X,S 1,4 <, TREAL, T1 MAG, 

1 TPREAL, T2IMAG? 

REAL 1 M P , P l , R 1 , R 2 , R 3 , K 3 ; 

REAL ARRAY X K E A l ( '4 0 9 b ) , X I m A G ( a 0 a a ) ; 

\ GOES ,'MT i)o PIT REVERSAL 

\ \l r R * * M 
Ai = t ; 

0) <(.100 1 = 1 10 M 

J = 2 * i; 

E'"J')o; 

\ m STAGE FO IR1LP IRA'ISFORM 

\ E V I COTE 1 HE LOOP in If'^ES FOR TI’ImG FMRPUSfS 

0 1 x = I f U I -1 ; 

Oil L = I TO ; 

\ Rf.PL ME *■ l T H I JL T iM r FXRA'SIO'J 0 IF 1.) ,\iu C XP(HL "IS 
\ LE ( ' = 2 * * ( '111 -L ) ; 

L F -. 0 = 1 ; 

,50 -V A IJ N T = 1 TO ( V f 1 -L ) ; 

L t 0 = R * l E n i 

c. o r H) o ; 

I. : 1 =LEv)/P; 

u 1 •** ag - 0 . u ; 

' J RF A L = I . 0 ; 

p>ust= r r / R L o a l (i E I ) ; 

\ A=C v ’Pl » ( COS ( R 4 A SF ) , -S I N ( P |J A St ) ) \ 

\ (G -EMU SIR A*,n CAS \ 



IF (PHASE . OT. (° 1/2.0)) lHE' 1 



phase; 



I vtp = 

EMODO 

ELSE 

•n; 

T v*o 

EM OOO 



D l 



phase; 





C 0 S X = R 1 


(RP x I HP x 


T HP ) t 


1 


(RS x 


IMP * r ’ P X T V ’P 


x 1 HP ) ; 




IF ( p H A S c 


. GI . (PI/P.O)) 


MEM 




nu; 

C03X = 
F'i nr ); 


- C U s x ; 





\ CALC J LA TE SIM \ 

IP ( PHASE .LI. (Pi/2.0) ) THEM 

no ; 

ivp r pi/p.o - phase; 

E M P 0 0 

FLSE 

no; 

rvp = pf - f k 5 * pi/?." - J h a s p ) ; 
E mo; 

SI MX = PI - (RP x MP * T -ip ) *- 

1 f ps * T * M? * T vip * MP) ; 



n c C I A T I 0 ’•■! I H Tin \ 

0 ) J = 1 T J LE 1 ; 

m i = f it ■) py lfp; 

I f +lE i ; 



T ?[ AL = xRFAL(I) + 

i t - -i a r; = < j a o ( n + 
I 3 P E A l - 



T PI HAS - 
XRK At ( [P ) 

( r p i p a s * 

XI HA i; (IP) 

r r 2 1 h a s * 

x p E A L ( I ) 

X \ H A G ( I ) 



X H E A L ( I ) - 

x I v* A S ( I ) - 

= (1PPEAL 

n las ) ; 

= (TPPEAL 

■iRf ali ; 

= I k E a l ; 

= n H A r, ; 



x p E a l ( I p j ; 

x 1 ’AST i p 7 ; 

x p E A L ( I p ) ; 
x l h a g ( IP) ; 

* J ;i E A L ) - 

x J 1 M A G ) * 



e m n d o ; 



UREAL = (UPE.AL * cnsx ) - (UIMAG * (-SINX)} 
U I M A 5 = (UREAL * (-SIMO ) t (UIMAG * COSX) 

EUDDO; 

Ernoo; 

ENOOO; 



SIOR; 

E mg . 



C THIS IS THE FORTRAN CHEBYSHEV COSINE ROUTINE. 
C THE COSINE OF ALL INTEGER ANGLES FROM 0 TO 
: ISO degrees is COMPUTED. the cosine OF EACH 
C ANGLE IS COMPUTED * K • TI^ES FOR TIDING 
C PURPOSES. 

SUBROUTINE COSINE 

INTEGER I , J , K , L , i Vi , N 

REAL PI, TEMP, R1,R2,P3, LIMIT, FANS 

COMMON /NCS/ I , J , K , L , m , M , P 1 , T E v 3 , 9 1 , R ? , R 3 , 
t LI MI T,FANS( 1 : 1 SO) 

DO a'HTLE ( J .LE. * ) 

I = 0 

DO vHILE (I .LE. ISO) 



IF (I 


. LE. 90) THEN 






T Fi-V 


= ( ( l * P I ) / L I m [ r ) 




ELSE 


1 £MP 


= C((N-[)*Pn/Ll 


’J T ) 


END I F 








F A M 3 ( I 


) = R I - 


C R -? * T E ‘ 1 9 + T E R ) + 






(P3+ 1 


E M o * j- f ■-* p * | r vi p * r (£ ■ 


Np) 



IF r I . G 1 . 9 0) F A N S ( n = F.A.'ISfl) * ( - 1) 

1 = 1 + 1 
E m 0 » n 

' = T + 1 

END DJ 

E in ; -if -i y c 0 S I n + 






\ THIS S JBRO IT IN? IS WRITTEN IN JRS ILL AND CALCULATES ThE 
COSINE OF THE ANGLES FROM L TO M DEGREES. THE DEGREES 
ARE FIRST CONVERTED TO RADIANS AND THEN THE CHEhOYSHEV 
APPROXIMATION is used find the VALJE of the cosine. 

THE LOOP TS EXECJTED K TMES TO ALLOW FOR TIMING OF I HE 
PROCEDURE. \ 

PROGRAM COSINE; 

INTEGER I,J,K,L,m,N; 

REAL P I , 1 F M P , R 1 »R?,R3» FACTOR: 

REAL ARRAY HAMS (160); 

DO J = l ID <; \ LOOP TO CONTROL ME NUMBER OF l|MfS 

THE COSINES ARE CALCULATED \ 

DO I = L TU M; \ LOOP P CONTROL i H A T ANGLES 
TO CALCULATE HE COSINE FOR \ 

IF (I . U E . RO) MEN 

TEMP = (( FLOAT(l) a PI) /FAC I Dp) 



El SE 



T F m p = ( (FLOAT ( N - l ) * PT) /FACTOR); 

HANS(l) = Rt - IKE* TE I F V P) + 

I R U T t MP * T F hp * T E MR « T t ;/ R ) ; 

\ CORRECT IMF STGN'flF The MS-vER \ 

T P ( L .01. DO) MEN 

MANSI T ) - (- HA' S C I ) ) ; 



F iDDp; 
E I Dm; 



T OH; 
■N D . 
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APPEND! X c 



SOMT I MG /SEARCH IMG ALGOR I THMS 



C THIS SUBROUTINE LOOKS FOR EACH TIE '1 I N THE 
0 ARRAY 'KEYS’ STARTING .% I T H THE FIRST I T E vl AND 
C ENDING H p ^ I T H THE LAST ITFM. EACH TI^E AN 
C II EH IS FOUND THE INDEX OF THE DESIRED I TEH 
C T 3 COMPARED N I T H THE INDEX OF THE FQjND ITEM 
C TO INSURE THAT THE C 0 R D E C T ITEM /.'AS FOUND. 

C IF AM IMPROPER ITEM T3 FOUND THEN THE. COUNT 
C OF ERRORS IS INC R E m E M I E D >3 Y ONF . 

SUBhuJTTNE binary sfapch 

I NTEGER KOONT , RESUL T , SI ZE 1 ,KEYS, K , UPPER , L0OF.R , 1 , J , 
t ERRORS, F 

COMMON / N CSX KEYS( 10000 ) ,KOUNT, RESULT, SI ZE 1 , UPPER, 
1 LOWER, l , J , F, ERRORS , k 

c loop r trough ever e l r a e n r of the array 

: AND LOOK f-(U< EACH ELF f/ E N T J'lCE. 

DO J = I, SIZE l 

: initialize the constants and variables 



< ’'.HIM T = 0 
RESULT = k t i S f J ) 

JpRER = SIZE l 
L U' EH = 1 

F r . P A L S P . 

IF f RES III .LI. * F Y 3 ! L 0 ■< E R ) ) THF-i 

HETI.IRm 

ELSE IF ( R F S ; 1 1 T .GI. i* E Y S ( Ri’Fy ) ) Mr 

Kt I IN R w 

ELSE 

DO WHILE (F .-N. .PfiLSE. ) 

I = ( JPPF-? f LJ.'F. R + 1)/? 

[FT RESUL I .lT. -FyS(I) ) InFN 
. i P d F R = 1-1 

else if t p e sol i . :;r. neysth ) imEj 

1.0/ e° = IH 

ELSE IF ( RES JL. I . tu. KEYS(l) ) I HEN 
F = .TRUE. 

FLOP 

RESULT = -*> 

F = .TRUE. 
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END I F 

IF ( UPPER .LT. LOrtER ) 

F = .TRUE. 

ELSE 

KQUNT = KOUMTfl 

END IF 
END DO 
END IF 

IF (I .ME. J) ERRORS = ERRORS 



END DO 



THEN 



+ 1 



END 1 OF BINARY SEARCH SUBROUTINE 



\ BINARY SEARCH PROGRAM R I T T E N IN JRS HLL \ 
\ ,'JHEM HE RESULT IS ASSIGNE") A MEGA] HE \ 

\ VALUE, THERE IS AN ERROR IN THE RESULTS \ 

PROGRAM 3 SE ARCH ; 

INTEGER ARRAY KEYS(IOOOO); 

I iTEGF.R K SUN T ,RE 3 UL T,St ZE 1 , UPPER, LO NER, 
T , J, FLAG , ERRORS, K ,* 



DU < = 1 TO SIZE 1 ; 



kOUNT = 0 ; 



RESULT r KEYSfK); 

UPPER = 3 l Zt 1 ; 

L ( ) ft E R = l; 
flag = o; 

IF ( RESULT .LI. KEYS(LOHER) 1 ThfiM 
RES U l r= -1 



ELSE 

TF ( RESULT .GT. KEYS(URPER) J 1 : < E 
RESULT = -2 

ELSE 

A U mhtlE (FLAG . E U . 0 ); 

I = ( UPPER + LO/'ER t l)/2; 

IF ( RES JL T .L T . KEYS Cl ) ) HE 

UPPER = I-t 



t.LSE 

IF (RESULT .GT. <EYS( I ) ) I 
L 0 E P = T+l 

ELSE 

1 F (RESULT .EU. k E Y S ( I ) I 
FLAG - t 
ELSE 
00 ; 

RESULT = 
c L A " = 1 ; 

E NOG a; 

IF ( J P D E R . I. T. L O /» F R ) Hh 
F L A f. = I 

ELSF 

OUN l = K VP' T H 1 ; 

F " u 0 u ; 



l F ( rv . IF . T ) I TEN 

ERR IPS = FRPjRS t 1; 



E ioou; 

S T 0 P ; 

E iU . 



QR 



' < E N 
I HE 11 



: THIS IS THE QUICK SORT ALGORITHM I \| FORTRAN 
C 'A' IS THE ARRAY HOLDING THE ITEMS TO BE SORTER 
C ALL INTEGERS IN *A' ARE GENERATED BY THE HARNESS 
C PROGRAM 



SUBROJFI NE SORT 

INTEGER I ,M, J,P, T,U,K ,Q1 , Y,N, L T ,UT , A 

COMMON / <v C S / I , M, J, P, T , 0, «, 01 , X, M,LT ( 1 a ) ,UT (1 '* ) , 

1 ATS 0 000) 

: INITIALIZE the VARTARI.es AND CONSTANTS 

J - H 

I = 1 
M = 1 

?00 IF (J - 1 .GT. t) THEN 
D = ( J + T )/? 

T - A ( » ) 

A ( R ) = A ( I) 

0 = 0 

DO 500 K r J t l, IN 

IF fAftO .GT. T) ThFN 
DO 20 T D| r Q , « , - 1 

IF ( A f Q 1 ) .LI. T) THEN 

/ = A ( K ) 

A ( K ) = A ( Q 1 ) 

A ( O tT = X 
D - Q 1 - l 

GOTO 1 20 





END IF 


201 


CON T T i IF 




N = < - 




GOTO 1 '4 0 


1 2 0 


f.-'DT F 


500 


CONTI NOE 


1 4 0 


A ( 1 ) = HQ) 



4N) = T 

IF ( 2 *M .GT. I ♦ J > THE J 

li (u = r 

U T ( ^ ) = N - 1 

I = J t 1 

ELSE 

L T ( M ) - 0 t 1 

0 r ( v-i ) r J 

J = 0 - 1 

END] F 

v r v *. i 



1 0 0 



30 TO 20 0 
ELSE 

IF (I .GE. J) THEN 
G 0 TO 16 0 
ELSE 

I p (Afl) .GT . A ( J 1) 

x = a c n 

A ( l ) = A ( J ) 

A ( J ) = X 
E'MDTF 

16 0 = M - 1 

IF CM .31 . 0) r IE M 
I = L T C v ) 

J = JIM) 

GO 10 2 0 0 
EMDIF 
EMMIE 
t ID IF 

E iD ! :)p so 3 r 



i 



THEM 



PROGRAM SORT; 



\ THIS PROGRAM SORTS THE ELEMENTS OF AO ARRAY INTO 

ASCENDING ORDER . Tut METHOD USED IS THE " QU I C K E RSOR T " 
ALGORITHM OF R.S. SCntvEN, ALGORITHM 7 1 , C«CM, VOL. 

B, NUMBER II, OCTOBER 1965. THIS VERSION MAS COPIED 
-ROM HE J R S HLL MANUAL FOR THE AMOS SYSTEM. 

THE ALGORITHM /*URKS BY CONTINUALLY SPLHTING THE ARRAY 
TNTO PARTS SUCH THAT ALL ELEMENTS OF ONE PART ARE LESS 
THAN ALL ELEMENTS OF THE OTHFR, N I T H A THIRD PART 
IN THE MIDDLE CONSISTING OF A SINGLE ELE V E NT. 

THE ARRAY TO UE SORTED IS 3 RF.-SET IN ’A* AMD THE OJ^ER 
OF ELEMENTS IN THE ARRAY IS SE I IN ' N * . ON FXIT, I HE 
ELEMENTS OF ARRAY 'A' ARE SORTED. \ 



I N T E G E R 1 , m , J , P , T , 1 , A , 0 l , X , N ; 

IN T E G E R ARRAY L T f 1 4 ) , U T ( 14 ) , A ( 5 0 0 0 0 ) ; 



0 = n; 

1 = i ; 

m = l ; 

1 0 O : IF (J-I.GT.t) THEN 

no; 

p = ( 0 + 1 ) / 2 ; 

r = A ( P ) ; 

A (P ) =• A ( I ) ; 

o = J ; 

DO < = I + 1 TO J ; 

IF (A(K).GT.T) THEN 

oo; 

DO 01=0 OUA'NtO * ; 

[R (A (01 1 .LI . H Tot' 

o o; 

x = A ( x ) ; 

a ( k ) = a ( n ) ; 
a ( o i ) = x ; 
o = oi-i ; 

Gil It 120; 

E '• 0 o 0 ; 

E Mu oo; 

0 = K - 1 ; 

GOTO !H0; 

E ID DO; 

1 2o : CONTINUE; 

E 'll) Of) ; 



i^a: Am = a ( o ) ; 

A ( Q ) = T ; 

IF ( 2 * Q .GT. I+J) HEM 

do; 

lkv) = I; 

UT(M) = Q-l; 

I = OH; 

EMDDO 
ELSE 
DO ; 

LI H ) = OH; 

U T ( v ) = j ; 

J - 0 - 1 ; 

EMDDO; 
m = >H1; 

GOTO 1 0 o ; 

EMDDO 

ELSE IF U.GE..J) T HE oi GOTO IbO 
EL3E 
do; 

IF (AC 1 ) .ST.A(J) ) HEM 

On; 

< = ACT); 

A ( I ) r A(I); 

A ( J) = x ; 

EMDDO ; 

HO: = w » i ; 

TF (WI.GT.OT HEM 
DO ; 

I = L T ( vi ) ; 

J = urt-i); 

GOTO 100; 

E >1000; 

E ' 1 D o Q ; 

s f op ; 
emd. 
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: STEVE PROGRAM IN FORTRAN l V 

C COPIED FROM BYTE MAGAZINE, JAN 83 

C THE SIEVE OF ERATOSTHENES ALGORITHM IDENTIFIES 

c the pri^e numbers from 3 in m. in this case 

C M = Is, 381 . THE PRIMES ARE STORED IN 
: AN ARRAY NAMED 'PRIMES' FOR VERIFICATION 

C IF THE ALGORITHM IN THE HARNESS PROGRAM 

SUBROUTINE SIEVESU 8 

INTEGER l , J,K, COUNT, I TER, PR T ME, \l , PRIMES 
LOGICAL FLAGS 

COMMON /F/ FLAG 3 C 8191 ) 

COMMON /STORE/ I » J , K , C U TNT, ITER, °RI V E,N, PRIME 3(19 00) 

DO 192 ITER = 1,20 
COUNT = 0 

M = 1 

DO 110 I = 1,8191 

FLAGS ( 1 > = . T 9f IF . 

DO 1 9 1 T = 1,8191 

I p ( . NO 1 . FLAGS ( I ) ) GOTO 19 1 
PRI'-'E = 1 t I + 1 
RR T v’hS ( J ) = PR IMF 

M = N + 1 

COUNT = COUNT + 1 

* = I f PRIME 
l r ( N .ST. HI 91) GOTO 191 
LU.1 180 I = t' , 8 19 1 , PR T me 

FLAGS(J) = .FALSE. 

CO >1 1 1 N.JE 
CON T i NUF 

E M D 1 OF ST EVE'S J* 



1 1 0 



160 

1 Q 1 

I 9P 



\ -ILL VERSION OF f HE SIEVE OF ERATOSTHENES 

THE PROGRAM IDENTIFIES THE PRIME NUMBERS BFTNEEM 
1 AND N. \ 

PROGRAM SIEVE; 



INTEGER I ,J,K,COUMT,L,PR!Mt,ZERn, m, TEN; 
INTEGER APR A V F L A G S ( H 1 B 1 ) , PR I MES ( \ 9 0 0 ) ; 
U 1 L = 1 TO TEN; 

CO ON l = o; 

J = i ; 

♦ 

DO I = 1 TO m; 

FLAGS! I ) = I! 

ENDDO; 

)0 1 = 1 TO m; 

i p (flags id .eo. n then 

do; 

PRL mp s T f l + l; 
?PTMES(J) = PRTmf; 
i = J f i ; 

nojiir = count * i; 

K = I ► u R T m f- ; 

Do i'i h i l f ( u . L E . ) ; 

FLAGS! K) = 0 ; 

K = K t PRIME; 

ENODO; 

E'.'ODo ; 

EO A|)'l; 

t ; flHi ; 

ST ip; 

E N ). 



I OS 



C 3U6BLE SORT IN FDR I RAN 

C THE INTEGERS IN ARRAY *A' ARE SORTED INTO 
D ASCENDING ORDER 3Y CONTINUALLY MOVING THE 
: 'NEXT' LARGEST ITEM TO ITS PROPER POSITION 
3 IN THE ORDERING. THE ALGORITHM IS I"PRUvED 
C 3 Y CHECKING EACH TIME THROJGH THE SORT TO 
C SEE IF ANY EXCHANGES HAVE SEEN MADE. TP 
C NONE ARE MADE THEN THE PROGRAM TERMINATES. 

SUBROUTINE BUBBLE 

integer i , n , x : h a ng , t e mr , a 

Common /aiCS/ I , N, XCHAMG, TEMP, A ft ODOR ) 



XCHAflG = .TRUE. , 

Oil 'j H I L E ( X C H A N G .EO. .TRUE.) 

X CHANG = .FALSE. 

N = ll-l 
DO T = 1,0 

I r (ATI) . G T . Add)) THEI 1 

temp = a m 

A ( 1 ) r Add) 
Ad+1T = T E M P 
XCHANG = . I RIJE. 

t ID IF 

E'lO DO 

E-ii) 00 

END I OF BIHBLE 



1 nh> 



\ THIS IS THE BUBBLE SORT IN JRS HLL 

ARRAY ' A * HOLDS THE INTEGERS TO BE SORTED. \ 

=>R0GR4‘< 3UBL; 

INTEGER I , M, XCHANG, TEMP; 

INTEGER ARRAY A(IOOOO); 

XCHANG = IT 

DO iAI H T L E (XCHAMG.NE.O) ; 

n = n - 1 ; 

X C H A i\i 5 = 0; 

DO 1=1 TO h ; 

IF ( A { I ) ,G1 . A ( I «■ 1 ) I THEN 

D o ; 

TEMP = A ( F ) ; 

A ( I) = a ( I + 1) ; 

a ( i n ) = te^r; 

X C H A \i G = 1 ; 

EH DO 3; 



E'iD'H); 



£0300; 

3 I OF ; 
END. 
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APPEMDTX D 



R I T viAMIPiJLAT I ON ALGORITHMS 



: 31 T MANIPULATION Program I\i fortran 
: APR A i ’A' holds THE PREGENERATED VALUES to 
: 3 E MANIPULATED. • M ' HOLDS THE N J M 3 E R OF 
C i T IES THE MANIPULATION v-JILL OCCUR F OR TIDING 
C PURPOSES. 

S U fl P 0 U T INF 31 TMA.'j IP 

INTEGER I,N,kDT,A 



COMMON 


/H C3/ 


I , 


N,POT , A( 1 00000) 


DO '4 0 0 


1 = 1/ 


vl 






A ( I ) 


z 


U A N H ( A ( 1 ) , A U) ) 




A ( I ) 


= 


j mi ( a ( m 




A C I ) 


= 


JNu I ( A ( n ) 




ATI) 


r 


J I A h S ( A ( I ) ) 




a c n 


z 


JTHIT-S(A( | ),0,3£1 




A (. I ) 


z 


1 A '0 ( A ( I) , A ( I ) ) 




ATM 


z 


l JH ( A ( T ) , A ( M) 




All) 


z 


USMFTf(A( I ) ,be, 3?) 


E ID DO 








t 10 


1 OF w 


I r 


M A | | P 



I N 8 



\ 31 fMANIP JLATIOM 3 R 0 G R A m IN JRS MIL. ARRAY ' A ' 

HOLDS THE VALUES TO RE MANIPULATED. N IS THE 
TOTAL NUMBER OF II^ES THF. ITEMS 'JILL BE 
MANIPULATED. 'I* IS THE LOOPING VARIABLE. \ 

PROGRAM B J T M A 0 I 3 ; 

INTEGER W N, ROT; 

INTEGER ARRAY A(tOOOOO); 

00 I = 1 10 N; 



A ( I) 


= 


All) 


. AND 


. A ( n ; 


A ( I ) 


r 


All) 


. XU R 


. ( MASK 1 51 , 0 ) ) ; 


All) 


= 


A ( I ) 


. x 0 R 


. (MASK (31,0)); 


A ( I ) 


r 


AOS (A (!) ) 


t 


III) 


= 


SRI. ( ( A ( I ) 


.AND. (MASK (31 


A(I> 


- 


All) 


. A N 1 


. A ( I ) ; 


A(I) 


- 


A ( I ) 


.OR. 


A ( I ) ; 


AIT) 




RLL C A 


( I ) / 


3?) ; 



E’VDOO ; 

s ro 3 ; 

E-O. 
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SUBROUTINE 31TREV 



************************************************* 

X - COMPLEX ARP AY X(2**M) 

M - NUMBER OF POINTS 

q A 3 E 0 UPON AN FFT FIRST DEVELOPED BY SIGNALS 
SCIENCE CORPORATION FQR PROJECT SALESCLERK. 

FIRST TRAN SCRIBED BY LCDR C LAIIRVICK, IJ S N 
MODIFIED BY LT M HARTONQ, USM 

************************************************* 

COMPLEX x(NORb),T 
INTEGER N, vlV?,NR.l , m, ),« 

REARRANGE ARRAY- BIT REVERSAL 



') = ? * * m 

NVP= N/B 
M"1 =\- I 
J= 1 

on 50 i = i, f't m i 

LF( I .3E.J) GO ID PS 
I =X ( J ) 
x m = a ( n 
X( [ T = T 
X = i >W B 

IF(^.SE.J) GO TO .50 
J - J - K 
X=K/P 
GO TO ^b 

J = .J + K 

RR T ) R N 
E <D 



I 1 0 



PROGRAM 3TT p EV 



\ a**************-********************************* 

SIT REVERSAL FDR F p T - AMGS HLL VERSION 

a*********************************************** 

SA3E U p ON AN FFT FTRST OE VELOPFO GY SIGNALS 
SCIENCE CORP. FDR PROJECT SALESCLERK. 
translated INTO HLL FOR T HE aiCS PY LT M HARTONG 

***********A****A**************************a****\ 

INTEGER l,J,Ni,M,L,LEO,LEl , T D ,<0 JMT,K, 

NV2, N v ,N'’l , P; 



REAL IREAL, JI m A G , PH ASE , COSX , S I R x , TREAL , T I^AG, 
T 2RE A L , T 2 l M A G , T MR , P I , 3 1 , R 2 , R 3 , K 3 ; 

REAL A R p A Y < R E A L ( '4 0 9 6 ) , X I v A G ( 'I 0 R b ) 

\ REPEAT SO TTMES FOR TT'ATNG PlJRPOSFS \ 

D ) L = 1 TU 3 0; 

\ N = 2 * * « \ 

m = l ; 

DO <0 JN f = 1 TO -1; 

N = n * 2 ; 

ENDPO; 

\ initialize the const a - 4 rs \ 



\ 



4 V 2 = N/2; 

T'M - "■ - l; 

) = 1 ; 

REARRANGE A RAY- JIT R K \j £. 3 3 ’> L \ 



ij i 



2 5 

26 



[= 1 TO NM; 

(l.GF.J) T R F GDTO 23; 
TR-.AL = x R 3 a L ( T ) ; 

TJVAO r X T 'X a G ( J ) ; 



X R E A L ( I) 
x I VAG( J ) 
X R E A I ( I ) 

x i v a c r ) 
k - N v 2 ; 

IF CK.GF. 
J=J-k; 

< = < / 2 ; 



= x » E * L U ) ; 

= » T Y A G C T ) ; 

= trfal; 

= T I v A G ; 

T) THE') GOTO 30; 



1 1 1 



SOTO 2 b’, 

30 : j = J i-K ; 

E'JDDO, 

El'JODO; \ END OF LOOP L \ 

STOP; 

EMI). 
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APPENDIX E 



SAMPLE HARNESS SETUP 



PROGRAM FACTORIAL 

C THIS IS THE FACTORIAL PROGRAM, FORTRAN VERSION 
Z AO INTEGER IS READ AND I HE FACTORIAL OF THAT 
C INTEGER IS PRINTED. THE FACTORIAL OF THAT 
C NUMBER IS CALCJLATED 100,000 TIMES BEFORE BEING 
C PRINTED for TIMING PURPOSES . 

INTEGER TOTAL, VALUE, 1E MP , COUNT, ZERO, 

1 TIMER, HANDLE, l RE T , TOST 

INTEGER TIMES, C , V, T, AMS, ONE 

COMMON / \ G S / TOTAL, VAL'JE, TE V P, COUNT, ZERO, 0 !P 



1 0 


forma T ( Z, ' 


ENTER A il 


1 NTEGER 


BFIvEEN 1 AND 12 ' , Z ) 


20 


FOR MAT (' IH 


E FACT^^rAL n F 1 


, 12 , ' IS ' , 1 R ) 


AO 


FORMA I (/, ' 


c a : r n k i a l 


USING 


FORTRAN S JOROU T I ML i‘ I r H 




1 CORviom • / ) 






TO 


FORMA T ( Z, ' 


FACTORIAL 


USING 


IRS HLL i T H COM M (j 1 1 • / ) 


30 


FORMA [ ( /, ' 


factorial 


USING 


STRAIGHT FORTRAN CoOF 




1 AiTIH c 0 M M ( | f\| ' 


Z 1 




3 0 


FORMAT ( /, • 


OR J T I "F 


= ',FS. 


2 , ' SECONDS' Z) 


7 0 


Format ( r 2 ) 








°0 


FUR via I ( / , ' 


c A C I TRIAL 


USING 


S I R A T G H T F 0 R T R A N 



1 WITHOJI CO -ivON'/) 



: READ /HAT FACr^RTAL TO OF I m [ -,£ 

JR I IF (6, 1 0) 

PE A 0(5, 7 0 i I E -IP 

S INITIALIZE. T iE CONST ANTS 

T I M F s = 10 0 0 0 o 

CMijVl = T I S 1 UU.'.-fo OF T I v tS T' EXtCiTF LOi'P 
ZERO = 0 

T 0 T a L = 1 

O' 'E = i 

C = IT mp3 

I = It MR 



: IMIS PARI IS S I R A I 3 1 1 FORTRAN U TF III CO 1MN j 



<v R I T E ( 6 , 5 0 ) 



IF (.NOT. LIRtlMirriMER(HANDLE)) CALL FRR 



DO aim I IE (C . 3 T . 0) 

C = C - 1 
V = I 
A NS = 1 

00 aIHILE (V .GT. 0) 

AMS = AMS * V 
V = V - 1 

F MO 00 

F MQ no 

IF (.-.or. L I H5ST A f II ' / <FR ( <?, T I'^ER, HANDLE ) ) CALL t ^ 

AlRI IF (6,20) T, A‘JS 

VR I TF ( 6 , bO) FL0A r ( T J'-iFR ) / 1 00 . 0 



: THIS IS I HE STRAIGHT FOOT R A M CODE VEPSlJM .* I T H CL^'J l 

/vR 1 IF (o, SO) 

C"lj m I = II '*E 5 
M I AL = ..) 



IF (.101. L T tS T, I M I TTrHFR(HA'inLE) ) CAIL ERR 
ro Mil I L F CCOOMI . ,1. 0) 

cou n = coumt - i 

VAL IF. = IF os 
I 0 T A L = 0 M E 

m MHIL- ( V A L ' I E. .3 1. 0) 

I D I A L = T r i r A L * VALUE 
\l AL IF = V Al UF - l 

fill 00 
EMI) I > "* 

IF ( . M () I . L IHTSTAT T I v>FK r P, I r V£ t <, HAMbl.F ) ) CALL 

,v'R I r F (r>,Po) IFM 3 , T 0 I AL 
* "'I IF (6,60) F L 0 A T ( I I o r R I / 1 () -> . •) 



F t-’R 



1 1 a 



r H I S PART IS A S JB ROUT I ME CALL I \l FORTRAN \'ITH CONDON 



COUNT = TIMES' 

TOTAL = ONE 

AIR I TEC 6# 3 0) 

IE (.NOT. HHTINITT IMER(HANDLF) ) CALL ERR 
CALL FAC 

IF (.NOT. LTiUiSTATT T M E R ( <? , T I M £ R , H A M 0 L E ) ) CALL ERR 

aiR I T F ( b, ?0 ) I FVP, TOTAL 

/JR I TE (6, 60) FLOAT ( I IMER ) / 1 00 . 0 

THIS PART USES JRS HLL A I T H COMMON 

I DIAL = 0 ME 
COUNT = TIMES 

/vRITE(6,uO) 

IF C . JOT . Llttini T I IMFR(HAi\OLE) ) CALL ERR 

CALL X F C C ( T 0 T A I. , IR'ET, INST) 

IF (.NOT. LIBT.STAf I IMER(^,TIMEk,HANOLF) ) CALL ERR 

ARI TE(b,?0) TEM- 5 , TOTAL 

NR I IF (6,60) FLOAT ( I T '1FR ) / I 00 . 0 

E '10 

THE FACTORIAL StHhOUIlME IN FOR I A : -j 
S IHRfJ .jT TNE c A C 

I ' r E Ot R total, VALUE, IENP, C H.i m f , ZERO, ONE 
C !MVf)\t /vC3/ TOTAL, V A L 1 E , TE^'P, COUNT, ZERO, (.-it 

on NHlLf (COUNT .01. ZERO) 

CUJuT - CO JUT - 1 

value: - 1 1 /p 

TOTAL = ONE 

Hi E ( VALUE .Of. 7 F R n ) 

1 I S 



TOTAL = TOTAL 
. VALUE = VALUE 

END DO 



EMD DO 

END ! UF S'JMM 



VALUE 

1 



1 I b 



\ FACTORIAL PROGRAM IN JRS HLL \ 



program factorial; 

INTEGER TOTAL, VALUE, TEMP, COUNT, 



DO 'JHILE (COUNT .Of. ZERO); 

COUNT = CO TINT - l; 

VALUE = TEMP; 

TOTAL = I ; 

DO /HILt (VALUT .GT. ZERO); 
TOTAL = TOTAL * VALUE; 
VALUE = VALUE - l; 

E NDDC ; 

E'ND'OO; 



5r.jp; 



END. 
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ZERO, T 



o n 



SUBROUTINE ERR IS JSEO FOR SIGNALING ERRORS FROM 
THE TIMING MECHANISM. 

SUBROUTINE ERR 

wRITEfb, 1 02) 

10? FORMAT (' PROBLEM WITH THE LIBRARY CALL') 

END 1 OF ERR 
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